PBS 150 of X: Script Plumbing (Bash)
This is take two of PBS 150 - the shownotes have been nearly entirely rewritten and we also rerecorded the podcast. If this is your first time on PBS 150, it won’t matter to you but if you were here before and it looks different, that’s why. We were not happy with the way the show went and because we have a commitment to quality for the learners in this series, we present to you a much improved (and longer) lesson. There is one entirely new segment near the end entitled “Direct Access to the Keyboard and Terminal”.
In the previous instalment we learned how to handle command line arguments with the same level of finesse typical terminal commands do, including with -x
style named options. At this stage our scripts are almost first-class citizens of Terminal-land, they just need to learn one more trick — to play nice with stream redirection, or as I like to call it, Terminal plumbing 🙂
Matching Podcast Episode
Listen along to this instalment on episode 767 of the Chit Chat Across the Pond Podcast.
You can also Download the MP3
Read an unedited, auto-generated transcript: CCATP_2023_05_13
Episode Resources
- The instalment ZIP file — pbs150.zip
PBS 149 Challenge Solution
The optional challenge at the end of the previous instalment was to update our breakfast menu script so it accepts two options:
-l LIMIT
whereLIMIT
is the maximum number of items that can be ordered, which must be a whole number greater than or equal to one.-s
to request some snark.
Here’s my sample solution, which you’ll find in the instalment ZIP as pbs149-challengeSolution.sh
:
#!/usr/bin/env bash
# read the menu
declare -a menu
while read -r menuLine
do
# skip comment lines
echo "$menuLine" | egrep -q '^[ ]*#' && continue
# skip empty lines
echo "$menuLine" | egrep -q '^[ ]*$' && continue
# store the menu item
menu+=("$menuLine")
done <<<"$(cat $(dirname "$BASH_SOURCE")/menu.txt)"
# initialise the options to their default values
limit='' # no limit
snark='' # no snark
# Process the commandline options
while getopts ':l:s' opt
do
case $opt in
l)
# validate and store the limit
if echo "$OPTARG" | egrep '^[1-9][0-9]*$'
then
limit=$OPTARG
else
echo "invalid limit - must be an integer greater than zero"
exit 2;
fi
;;
s)
snark=1
;;
?)
echo "Usage: $(basename $0) [-s] [-l LIMIT]"
exit 2
esac
done
# create an empty array to hold the order
declare -a order
# present the menu, with a done option
if [[ -z $limit ]]
then
if [[ -n $snark ]]
then
echo "Wha' d' ya want (greed is grand)?"
else
echo 'Choose your breakfast (as many items as you like)'
fi
else
if [[ -n $snark ]]
then
echo "Pick no more than $limit things, and make it snappy!"
else
echo "Choose up to $limit breakfast items"
fi
fi
select item in done "${menu[@]}"
do
# skip invalid selections ($item is empty)
[[ -z $item ]] && continue
# exit if done
[[ $item == done ]] && break
# store and print the item
order+=("$item")
if [[ -n $snark ]]
then
echo "Fine, you can have $item"
else
echo "Added $item to your order"
fi
# if we're limiting, check the limit
if [[ -n $limit ]]
then
[[ ${#order[@]} -ge $limit ]] && break
fi
done
# print the order
if [[ -n $snark ]]
then
echo -e "\nYour sodding ${#order[@]} items:"
else
echo -e "\nYou ordered the following ${#order[@]} items:"
fi
for item in "${order[@]}"
do
echo "* $item"
done
This solution is very much by the book, and there isn’t really anything particularly worth highlighting other than noting that the section of the code that changed most is the section that processes the command line arguments:
# initialise the options to their default values
limit='' # no limit
snark='' # no snark
# Process the commandline options
while getopts ':l:s' opt
do
case $opt in
l)
# validate and store the limit
if echo "$OPTARG" | egrep '^[1-9][0-9]*$'
then
limit=$OPTARG
else
echo "invalid limit - must be an integer greater than zero"
exit 2;
fi
;;
s)
snark=1
;;
?)
echo "Usage: $(basename $0) [-s] [-l LIMIT]"
exit 2
esac
done
This is a textbook example of using getopts
for command line options. The first argument to getopts
is the definition string, in this case ':l:s'
, and the second the name we choose to give to the variable that will hold the option name on each pass through the while
loop. In terms of the variable name, I’ve stuck to the convention I recommended of always naming it $opt
. Given how cryptic they appear, the definition string is worth a look. The leading :
means we are taking responsibility for communicating error messages to the user, the l:
means we support an optional argument named l
, and the s
means we support a flag named s
. As a reminder, optional arguments are options that need values, and flags are those that do not.
The POSIX Foundations of Stream Redirection in Bash
In this instalment we’re going to focus on stream redirection from the point of view of a script author, not from the point of view of a terminal user. You’ll find the user’s point of view well covered in our Taming the Terminal Podcast instalments 15 & 16.
It’s important to note that the fundamental concepts we’ll be building on today are not Bash specific; they’re not even specific to shell scripts in general. They are a fundamental part of the POSIX specification, and they’re just as important to developers writing full-blown apps in compiled languages like C.
Streams, File Descriptors & Files
In a POSIX-compliant operating system, the OS represents flows of data into, out of, and between processes as streams. The OS maintains a list of streams each process has access to, and presents the ends of those streams to those processes as numeric IDs which are confusingly named file descriptors. These are not files, we’ll get to those shortly!
Each process gets its own set of file descriptors (numeric IDs), starting at zero. If a stream is used to connect one process to another, then it’s more likely than not that each process will know the same stream by different numeric IDs.
From a process’s point of view, each file descriptor (each end of a stream) has a direction. A file descriptor can be used to receive input, i.e. be readable, and it can accept output, i.e.hi to be writeable. It’s possible for streams to be bidirectional, so file descriptors can be simultaneously readable and writeable.
POSIX compliant OSes provides each process with three standard file descriptors that always have the same numbering:
- Standard Input, or
STDIN
which has the numeric ID0
. This is a readable file descriptor that’s connected to the process’s terminal session by default. In other words, when a terminal window has focus, the keyboard provides the standard input stream to the process that’s running in that terminal. - Standard Output, or
STDOUT
which has the numeric ID1
. This is a writeable file descriptor that’s connected to the terminal session the process is running in by default. - Standard Error, or
STDERR
which has the numeric ID2
, which is also writeable and connected to the process’s terminal by default.
Files
In the POSIX universe, the concept of a file is used to represent much more than just regular data files. All sorts of related concepts are managed by POSIX OSes as special files. The special files you’re probably most familiar with are directories, which Mac users would generally call folders. There are also symbolic links (AKA symlinks) and hard links which allow one file to act as an alias for another. But the rabbit hole goes much deeper! Special files are used to represent just about anything that can provide or accept streams of data. This includes established network connections, and their similar local cousin the socket, but of most relevance to us, that also includes the ends of streams, and devices.
Special files representing devices are generally added to the file system under /dev
, but that’s just a convention.
There are special files representing the end points for special data flows, including:
/dev/null
is a writeable special file that accepts and then discards information — it’s a kind of black hole for data!/dev/zero
is a readable stream of infinite zeros (there is no/dev/one
)/dev/random
is a readable stream of low-entropy random data
Most file paths exist universally, that is to say, they have a global scope, but some special file paths are scoped locally, that is to say each process gets their own copy of these special files. The three most common process-specific special files are:
/dev/stdin
which represents the readable end of the process’s standard input stream./dev/stdout
which represents the writeable side of the process’s standard output stream./dev/stderr
which represents the writeable side of the process’s standard error stream.
Stream Redirection
The POSIX specification allows processes to create streams and remap streams, so command shells like Bash can use the POSIX APIs to manipulate streams. It’s these stream manipulations that I refer to as terminal plumbing.
Stream Redirection in Bash
Bash provides a number of operators for manipulating streams. These operators adjust the plumbing on a command before the command executes. We’ll look at the most important of these operators in a moment, but there are two things to note — firstly, this is not an exhaustive list, and secondly, later versions of Bash have added new redirection operators that are so-called syntactic sugar. They add no new capabilities, but they allow for nicer (usually shorter) syntaxes. To keep this series as cross-platform as possible I’ll be ignoring these sugary treats 🙂
Note that these operators work on files and file descriptors, and remember that file descriptors are numeric IDs representing the ends of streams and files can be regular files or special files, including devices.
The <
Operator: File → Readable File Descriptor
The <
operator connects a file to an input stream. In its most generic form, the operator has the following syntax:
FILE_DESCRIPTOR<FILE
However, the file descriptor is optional and has a very sensible default, 0
(STDIN
), and that’s almost always what you want, so you’ll usually see the operators used like so:
wc -l </etc/hosts
The >
and >>
Operators: Writeable File Descriptor → File
The difference between >
and >>
is that >
replaces the content of the file, and >>
appends to it. Both operators will create a new file if ones doesn’t exist already.
These operators connect the writeable side of a stream to a file, and the generic syntax is:
FILE_DESCRIPTOR>FILE
FILE_DESCRIPTOR>>FILE
Again, the file descriptor is optional, and has the sensible default of 1
(STDOUT
). This operator is often used to redirect error messages, i.e. with the file descriptor 2
for STDERR
, so you’ll commonly see examples like:
# standard out to new file (replace)
someScript.sh >output.log
# standard out appended to file
someScript.sh >>output.log
# standard error to new file (replace)
someScript.sh 2>error.log
# standard error appended to file
someScript.sh 2>>error.log
The |
Operator: The Command Pipeline
This redirection operator is used to connect two processes together, and the general syntax is:
COMMAND1 | COMMAND2
The operator connects the writeable end of the first command’s standard output stream to the readable end of the second command’s standard input stream. I.e. the first command’s file descriptor 1
becomes the second command’s file descriptor 0
.
For example:
grep local /etc/hosts | wc -l
File Descriptor Aliases
To make it easy to merge streams, Bash makes special file names available for all file descriptors. They’re simply the file descriptor’s numeric ID prefixed with an ampersand (&
), so the file descriptor 0
can be referenced as the file &0
.
This can get confusing, but just remember that whenever the Bash spec says you should provide a file descriptor, you use the bare number, and when it says you should provide a file, you use the number prefixed with an ampersand.
Merging Standard Error into Standard Out
To give a practical example of using the file version of a file descriptor, let’s see how we would merge standard error into standard out. We’re not connecting two processes, so we can’t use the |
operator, and these are output streams, so we can’t use <
, so that just leaves >
and >>
. Appending to a stream makes no sense, so we by process of elimination we must need >
.
The spec for >
requires a source file descriptor, and a destination file. We want to merge standard error into standard out, so the source is the file descriptor 2
. That needs to go to a file, but we want to use the file descriptor 1
, so to do that we pre-fix it with an &
. Putting this all together, to redirect standard error to standard out, we use 2>&1
.
You Can do Multiple Redirections
It’s perfectly fine to define multiple redirections on a single command, and to combine the redirection operators with the pipeline operator. Just remember that a pipeline joins commands, so your redirections on the left of a pipe only apply to the command on the left, and redirections to right only to the command on the right.
For example, to send the regular output from a script to one file, and the errors to another you can use commands of the form:
someScript.sh >output.log 2>error.log
And, if you want to send both regular and error output to the next command on the pipeline you need to merge the output and error before the pipeline, so your commands would take the form:
someScript.sh 2>&1 | someOtherScript.sh
Experimenting with Redirection and Our Scripts
To see all this in action, let’s make use of my sample solution for the previous challenge (pbs149-challengeSolution.sh
in the instalment ZIP).
Redirecting a File Into a Script
The <
operator is used to send the contents of a file into a script as the standard input, so we’ll redirect the file favouriteOrder.txt
in the instalment ZIP into my sample solution. Before we do that, let’s quickly look at its contents:
2
3
1
You’ll notice that corresponds to the menu options pancakes, waffles, and done.
So let’s go ahead and redirect this file into the script:
./pbs149-challengeSolution.sh <favouriteOrder.txt
What you’ll see is that the script behaves as if we had typed 2
the first time we were offered the menu, then 3
the second time, and finally 1
the third time, i.e. we order pancakes, waffles, and then finish our order.
The contents of the file took the place of our keyboard.
Redirecting the Output of a Script to a File
Next, let’s redirect the output from the script to a file named transcript.txt
with the >
operator (while continuing to use favouriteOrder.txt
as the input):
./pbs149-challengeSolution.sh <favouriteOrder.txt >transcript.txt
The first thing you’ll notice is that the terminal is not showing most of the output because standard out has been redirected to the file (cat transcript.txt
):
Choose your breakfast (as many items as you like)
Added pancakes to your order
Added waffles to your order
You ordered the following 2 items:
* pancakes
* waffles
Notice that output has been bifurcated, some when to the terminal, some to the file, and none to both.
The docs for the select
command reveal the reason for this:
“… the set of expanded words is printed on the standard error output stream …”
In other words, the menu was sent to standard error, and we only sent standard out to the file, so standard error was still connected to the terminal session.
Writing to STDERR
rather than STDOUT
At the moment the question (‘Choose your breakfast …’) is being sent to standard out, but the options to standard error. That’s silly, so let’s also send the question to standard error where the options are going.
To do that we just need to redirect the relevant echo
command’s standard output to standard error with the >
operator. We could use the long form of the operator (1>&2
), but since standard out is the default, we can simplify it to simply:
echo 'Choose your breakfast (as many items as you like)' >&2
Other similar feedback related to the select
should be similarly redirected. You’ll find a version of the script with all appropriate echo
commands redirected to STDERR
in the file pbs150a-menu.sh
in the instalment ZIP.
We can run our favourite order through this updated script with:
./pbs150a-menu.sh <favouriteOrder.txt >transcript.txt
And now we get a more useful transcript:
You ordered the following 2 items:
* pancakes
* waffles
Better yet, we can use the command interactively, even when capturing the output in the transcript file. Try it with:
./pbs150a-menu.sh >transcript.txt
So, think carefully about what output your script sends to STDOUT
, and what output it sends to STDERR
.
Scripts in Pipelines
Now let’s use our script in a pipeline, first, we’ll receive an order from an echo
command:
echo -e "2\n3\n1" | ./pbs150a-menu.sh
We can of course then go on to redirect the output from our script to another script or terminal command by adding another pipe, in this case, we can count the lines the script write to standard out with:
echo -e "2\n3\n1" | ./pbs150a-menu.sh | wc -l
You’ll see the answer is 4
, but the output to STDERR
coming from our script is confusing things.
Sending Errors into Oblivion
Let’s hide the unwanted standard error output in the previous command by redirecting it (file descriptor 2
) to the null device (/dev/null
):
echo -e "2\n3\n1" | ./pbs150a-menu.sh 2>/dev/null | wc -l
Now our line count is much clearer!
Some Scripting Considerations
File Descriptors are Inherited
When your script calls another terminal command, that terminal command inherits the three standard file descriptors (STDIN
, STDOUT
& STDERR
) from your script. This means that all commands you call that write to STDOUT
go to wherever your script is outputting, and, any command you call that tries to read from STDIN
will read from wherever your script is getting it standard input.
This is why the read
and select in
commands in our sample solution worked when we redirected our text file to the script’s input stream.
Reading Streams is Destructive
When you read a character from an input stream it’s gone from that stream! That’s true of all streams, including the standard input stream.
Don’t worry, this does not mean regular files get deleted by reading them! When you redirect a regular file to a stream what happens is that the contents of the file gets copied into the stream, and it’s that copy that gets consumed as your script reads from the stream.
What this does mean is that you only get one shot at reading any stream, so you need to capture and save anything you need later!
Capturing STDIN
We’ve already seen that the read
command takes its information from STDIN
, but it does so in pieces, not all at once. By default, it does so line-by-line, but what’s really going on is that it does so $IFS
by $IFS
. What read
does is copy characters from STDIN
to a variable character-by-character until it meets a character equal to the input fields separator, i.e. the POSIX variable $IFS
. This means you can use use read
to slurp in all the data in STDIN
by setting $IFS
to an empty string. I’m not a fan of this approach because it means I have to remember to be sure I make my change to $IFS
in such a way that I avoid ‘spooky action at a distance’ bugs.
My preferred method for reading all of STDIN
into a variable at once, known colloquially as slurping the standard input, is to use the cat
command with no arguments. As its man
page says, when passed no arguments it writes STDIN
to STDOUT
. Because cat
will inherit our script’s STDIN
, it will read what we want it to read, and we can then use the $()
syntax to store the command’s output in a variable.
We can see this in action with the simple script pbs150b-naiveShouter.sh
in the instalment ZIP:
#!/usr/bin/env bash
# always read STDIN into a variable
theInput=$(cat)
# convert the input to all upper case with the tr command
theOutput=$(echo "$theInput" | tr [:lower:] [:upper:])
# output the upper-cased text
echo "$theOutput"
This script slurps all of STDIN, converts it to upper case using the tr
(translate) command, then outputs the upper-case, i.e. shouty text to STDOUT
.
Notice how we read the input into $theInput
with the simple line:
theInput=$(cat)
We can see this script in action with the command:
echo 'hello world!' | ./pbs150b-naiveShouter.sh
But what happens if we try to run this script without redirecting something into STDIN
? The script hangs! It’s waiting for input from the keyboard, and because we’re using cat
rather than read
, it will keep waiting even after we hit return. In fact, it will keep waiting for input until we enter the EOF (End of File) character, which we can type with ctrl+d
. Try it:
./pbs150b-naiveShouter.sh
Intelligently Slurping STDIN
That’s clearly suboptimal, so the obvious question you’re probably asking is ‘how do we detect whether or not STDIN
has content for us to slurp?’.
I’m sorry to say, if you need your script to be portable, and to work everywhere, you can’t. 🙁
If you know your script will always be run with modern versions of Bash, i.e. never on a Mac (Apple has a philosophical argument against GPLv3), then you can use the exit code from read -t 0
to detect whether or not there is data to be read in STDIN
, but alas that doesn’t work in older versions of Bash, like the one Apple still ship in even the very latest macOS versions.
That does not mean we can’t write user-friendly scripts — we can simply copy the behaviour of standard terminal commands like cat
and grep
.
Both of these commands are examples of a common convention used by most terminal commands that need a stream of input to work on.
They implement a simple two-rule algorithm:
- If zero files are specified, read from STDIN.
- If one or more files are specified, check each passed file path to see if it’s the string
-
, and if it is, read fromSTDIN
, otherwise, read from the specified file.
Because this is a very common convention, getopts
does not get confused by the lone -
. This convention is used to indicate the script should read from STDIN
.
To see this in action, let’s update our shouting script to implement the cat
convention, and, to support an optional -b
flag to request an exclamation point (a bang) be appended to the output.
You’ll find this script in the instalment ZIP as pbs150c-shouter.sh
:
#!/usr/bin/env bash
# start by processing the options
bang='' # assume no bang
while getopts ":b" opt
do
case $opt in
b)
bang='!'
;;
?)
echo "Usage: $(basename $0) [-b] [FILE|-]..."
exit 2
;;
esac
done
shift $(echo "$OPTIND-1" | bc)
#
# -- load the text to shout --
#
text=''
# if there are no args, read STDIN
if [[ $# = 0 ]]
then
text+=$(cat)
fi
# loop over all args and read from files or STDIN as appropriate
for path in "$@"
do
# determine whether to treat as file or STDIN
if [[ $path = '-' ]]
then
# read from STDIN
text+=$(cat)
else
# read from file
text+=$(cat "$path")
fi
done
# convert the text to all upper case with the tr command
shoutedText=$(echo "$text" | tr [:lower:] [:upper:])
# output the upper-cased text and its optional bang
echo "$shoutedText$bang"
We can now see this script behave like cat
with the following examples:
# no args means assume STDIN
echo 'hello world' | ./pbs150c-shouter.sh
echo 'hello world' | ./pbs150c-shouter.sh -b
# regular file args are read as files
./pbs150c-shouter.sh menu.txt
./pbs150c-shouter.sh -b menu.txt
# the - can be added to the file list to read STDIN and regular files
echo -e "\n\nhello world" | ./pbs150c-shouter.sh -b menu.txt -
Direct Access to the Keyboard and Terminal
By default, Bash’s various commands for reading input from the user and writing output will use the three standard streams. Most of the time, that’s fine, but what if you really need to write a message to the terminal, or, to read input from the keyboard, regardless of whether or not your script is in a pipeline. Can you access the terminal directly?
Why would you want to do this? A great example is invoking an SQL command on a database server that needs a password. It’s reasonable to redirect a file with SQL statements into the command, and to send the output to a log file and the errors to a separate error log, so that means standard in is coming from the SQL file, and standard out and standard error are going to their separate log files. How can the SQL command tell you it needs a password, and let you enter one? If you try with mysql
you’ll see that it does still somehow manage to write the password prompt to your terminal, and to read your password from the keyboard. How?
The POSIX standard specifies that each process should get its own copy of a special file named /dev/tty
which is a readable and writeable reference to the terminal session the process is running in. This means you can always read from the keyboard by using </dev/tty
, and you can always write to the terminal with >/dev/tty
.
Final Thoughts
We’re nearing the end of our Bash journey now — we can write shell scripts that behave just like standard terminal commands. They can play nice in the pipeline, they can accept options in the traditional way, and they’ll run reliably on a very broad range of OSes.
What’s left to do? Mostly, just a little housekeeping!
I want to finish the series by making some things we’ve been doing implicitly explicit, specifically the different kinds of expansions Bash can do, and the exact behaviour of the various kinds of brackets that can be used to group things.
But, before doing that, I want to look at controlling scope in a little more detail, and, at better output formatting with printf
rather than echo
. That’s where we’ll start next time.
An Optional Challenge
Update your challenge solution from last time so it can ingest its menu in three ways:
- Default to reading the menu from a file named
menu.txt
in the same folder as the script. - Read the menu from
STDIN
with the optional argument-m -
. - Read the menu from a file with the optional argument
-m path_to_file.txt
Regardless of where the menu is coming from, always present the menu interactively, i.e., the user always has to choose using the keyboard.
Finally, to simplify your script, feel free to remove the -s
(snark) flag added last time.