PBS 150 of X: Script Plumbing (Bash)

13 May 2023

This is take two of PBS 150 - the shownotes have been nearly entirely rewritten and we also rerecorded the podcast. If this is your first time on PBS 150, it won’t matter to you but if you were here before and it looks different, that’s why. We were not happy with the way the show went and because we have a commitment to quality for the learners in this series, we present to you a much improved (and longer) lesson. There is one entirely new segment near the end entitled “Direct Access to the Keyboard and Terminal”.

In the previous instalment we learned how to handle command line arguments with the same level of finesse typical terminal commands do, including with -x style named options. At this stage our scripts are almost first-class citizens of Terminal-land, they just need to learn one more trick — to play nice with stream redirection, or as I like to call it, Terminal plumbing 🙂

Matching Podcast Episode

Listen along to this instalment on episode 767 of the Chit Chat Across the Pond Podcast.

You can also Download the MP3

Read an unedited, auto-generated transcript: CCATP_2023_05_13

Episode Resources

The instalment ZIP file — pbs150.zip

PBS 149 Challenge Solution

The optional challenge at the end of the previous instalment was to update our breakfast menu script so it accepts two options:

-l LIMIT where LIMIT is the maximum number of items that can be ordered, which must be a whole number greater than or equal to one.
-s to request some snark.

Here’s my sample solution, which you’ll find in the instalment ZIP as pbs149-challengeSolution.sh:

#!/usr/bin/env bash

# read the menu
declare -a menu
while read -r menuLine
do
    # skip comment lines
    echo "$menuLine" | egrep -q '^[ ]*#' && continue

    # skip empty lines
    echo "$menuLine" | egrep -q '^[ ]*$' && continue

    # store the menu item
    menu+=("$menuLine")
done <<<"$(cat $(dirname "$BASH_SOURCE")/menu.txt)"

# initialise the options to their default values
limit='' # no limit
snark='' # no snark

# Process the commandline options
while getopts ':l:s' opt
do
    case $opt in
        l)
            # validate and store the limit
            if echo "$OPTARG" | egrep '^[1-9][0-9]*$'
            then
                limit=$OPTARG
            else
                echo "invalid limit - must be an integer greater than zero"
                exit 2;
            fi
            ;;
        s)
            snark=1
            ;;
        ?)
            echo "Usage: $(basename $0) [-s] [-l LIMIT]"
            exit 2
    esac
done

# create an empty array to hold the order
declare -a order

# present the menu, with a done option
if [[ -z $limit ]]
then
    if [[ -n $snark ]]
    then
        echo "Wha' d' ya want (greed is grand)?"
    else
        echo 'Choose your breakfast (as many items as you like)'
    fi
else
    if [[ -n $snark ]]
    then
        echo "Pick no more than $limit things, and make it snappy!"
    else
        echo "Choose up to $limit breakfast items"
    fi
fi
select item in done "${menu[@]}"
do
    # skip invalid selections ($item is empty)
    [[ -z $item ]] && continue

    # exit if done
    [[ $item == done ]] && break

    # store and print the item
    order+=("$item")
    if [[ -n $snark ]]
    then
        echo "Fine, you can have $item"
    else
        echo "Added $item to your order"
    fi

    # if we're limiting, check the limit
    if [[ -n $limit ]]
    then
        [[ ${#order[@]} -ge $limit ]] && break
    fi
done

# print the order
if [[ -n $snark ]]
then
    echo -e "\nYour sodding ${#order[@]} items:"
else
    echo -e "\nYou ordered the following ${#order[@]} items:"
fi
for item in "${order[@]}"
do
    echo "* $item"
done

This solution is very much by the book, and there isn’t really anything particularly worth highlighting other than noting that the section of the code that changed most is the section that processes the command line arguments:

# initialise the options to their default values
limit='' # no limit
snark='' # no snark

# Process the commandline options
while getopts ':l:s' opt
do
    case $opt in
        l)
            # validate and store the limit
            if echo "$OPTARG" | egrep '^[1-9][0-9]*$'
            then
                limit=$OPTARG
            else
                echo "invalid limit - must be an integer greater than zero"
                exit 2;
            fi
            ;;
        s)
            snark=1
            ;;
        ?)
            echo "Usage: $(basename $0) [-s] [-l LIMIT]"
            exit 2
    esac
done

This is a textbook example of using getopts for command line options. The first argument to getopts is the definition string, in this case ':l:s', and the second the name we choose to give to the variable that will hold the option name on each pass through the while loop. In terms of the variable name, I’ve stuck to the convention I recommended of always naming it $opt. Given how cryptic they appear, the definition string is worth a look. The leading : means we are taking responsibility for communicating error messages to the user, the l: means we support an optional argument named l, and the s means we support a flag named s. As a reminder, optional arguments are options that need values, and flags are those that do not.

The POSIX Foundations of Stream Redirection in Bash

In this instalment we’re going to focus on stream redirection from the point of view of a script author, not from the point of view of a terminal user. You’ll find the user’s point of view well covered in our Taming the Terminal Podcast instalments 15 & 16.

It’s important to note that the fundamental concepts we’ll be building on today are not Bash specific; they’re not even specific to shell scripts in general. They are a fundamental part of the POSIX specification, and they’re just as important to developers writing full-blown apps in compiled languages like C.

Streams, File Descriptors & Files

In a POSIX-compliant operating system, the OS represents flows of data into, out of, and between processes as streams. The OS maintains a list of streams each process has access to, and presents the ends of those streams to those processes as numeric IDs which are confusingly named file descriptors. These are not files, we’ll get to those shortly!

Each process gets its own set of file descriptors (numeric IDs), starting at zero. If a stream is used to connect one process to another, then it’s more likely than not that each process will know the same stream by different numeric IDs.

From a process’s point of view, each file descriptor (each end of a stream) has a direction. A file descriptor can be used to receive input, i.e. be readable, and it can accept output, i.e.hi to be writeable. It’s possible for streams to be bidirectional, so file descriptors can be simultaneously readable and writeable.

POSIX compliant OSes provides each process with three standard file descriptors that always have the same numbering:

Standard Input, or STDIN which has the numeric ID 0. This is a readable file descriptor that’s connected to the process’s terminal session by default. In other words, when a terminal window has focus, the keyboard provides the standard input stream to the process that’s running in that terminal.
Standard Output, or STDOUT which has the numeric ID 1. This is a writeable file descriptor that’s connected to the terminal session the process is running in by default.
Standard Error, or STDERR which has the numeric ID 2, which is also writeable and connected to the process’s terminal by default.

Files

In the POSIX universe, the concept of a file is used to represent much more than just regular data files. All sorts of related concepts are managed by POSIX OSes as special files. The special files you’re probably most familiar with are directories, which Mac users would generally call folders. There are also symbolic links (AKA symlinks) and hard links which allow one file to act as an alias for another. But the rabbit hole goes much deeper! Special files are used to represent just about anything that can provide or accept streams of data. This includes established network connections, and their similar local cousin the socket, but of most relevance to us, that also includes the ends of streams, and devices.

Special files representing devices are generally added to the file system under /dev, but that’s just a convention.

There are special files representing the end points for special data flows, including:

/dev/null is a writeable special file that accepts and then discards information — it’s a kind of black hole for data!
/dev/zero is a readable stream of infinite zeros (there is no /dev/one)
/dev/random is a readable stream of low-entropy random data

Most file paths exist universally, that is to say, they have a global scope, but some special file paths are scoped locally, that is to say each process gets their own copy of these special files. The three most common process-specific special files are:

/dev/stdin which represents the readable end of the process’s standard input stream.
/dev/stdout which represents the writeable side of the process’s standard output stream.
/dev/stderr which represents the writeable side of the process’s standard error stream.

Stream Redirection

The POSIX specification allows processes to create streams and remap streams, so command shells like Bash can use the POSIX APIs to manipulate streams. It’s these stream manipulations that I refer to as terminal plumbing.

Stream Redirection in Bash

Bash provides a number of operators for manipulating streams. These operators adjust the plumbing on a command before the command executes. We’ll look at the most important of these operators in a moment, but there are two things to note — firstly, this is not an exhaustive list, and secondly, later versions of Bash have added new redirection operators that are so-called syntactic sugar. They add no new capabilities, but they allow for nicer (usually shorter) syntaxes. To keep this series as cross-platform as possible I’ll be ignoring these sugary treats 🙂

Note that these operators work on files and file descriptors, and remember that file descriptors are numeric IDs representing the ends of streams and files can be regular files or special files, including devices.

The `<` Operator: File → Readable File Descriptor

The < operator connects a file to an input stream. In its most generic form, the operator has the following syntax:

FILE_DESCRIPTOR<FILE

However, the file descriptor is optional and has a very sensible default, 0 (STDIN), and that’s almost always what you want, so you’ll usually see the operators used like so:

wc -l </etc/hosts

The `>` and `>>` Operators: Writeable File Descriptor → File

The difference between > and >> is that > replaces the content of the file, and >> appends to it. Both operators will create a new file if ones doesn’t exist already.

These operators connect the writeable side of a stream to a file, and the generic syntax is:

FILE_DESCRIPTOR>FILE
FILE_DESCRIPTOR>>FILE

Again, the file descriptor is optional, and has the sensible default of 1 (STDOUT). This operator is often used to redirect error messages, i.e. with the file descriptor 2 for STDERR, so you’ll commonly see examples like:

# standard out to new file (replace)
someScript.sh >output.log

# standard out appended to file
someScript.sh >>output.log

# standard error to new file (replace)
someScript.sh 2>error.log

# standard error appended to file
someScript.sh 2>>error.log

The `|` Operator: The Command Pipeline

This redirection operator is used to connect two processes together, and the general syntax is:

COMMAND1 | COMMAND2

The operator connects the writeable end of the first command’s standard output stream to the readable end of the second command’s standard input stream. I.e. the first command’s file descriptor 1 becomes the second command’s file descriptor 0.

For example:

grep local /etc/hosts | wc -l

File Descriptor Aliases

To make it easy to merge streams, Bash makes special file names available for all file descriptors. They’re simply the file descriptor’s numeric ID prefixed with an ampersand (&), so the file descriptor 0 can be referenced as the file &0.

This can get confusing, but just remember that whenever the Bash spec says you should provide a file descriptor, you use the bare number, and when it says you should provide a file, you use the number prefixed with an ampersand.

Merging Standard Error into Standard Out

To give a practical example of using the file version of a file descriptor, let’s see how we would merge standard error into standard out. We’re not connecting two processes, so we can’t use the | operator, and these are output streams, so we can’t use <, so that just leaves > and >>. Appending to a stream makes no sense, so we by process of elimination we must need >.

The spec for > requires a source file descriptor, and a destination file. We want to merge standard error into standard out, so the source is the file descriptor 2. That needs to go to a file, but we want to use the file descriptor 1, so to do that we pre-fix it with an &. Putting this all together, to redirect standard error to standard out, we use 2>&1.

You Can do Multiple Redirections

It’s perfectly fine to define multiple redirections on a single command, and to combine the redirection operators with the pipeline operator. Just remember that a pipeline joins commands, so your redirections on the left of a pipe only apply to the command on the left, and redirections to right only to the command on the right.

For example, to send the regular output from a script to one file, and the errors to another you can use commands of the form:

someScript.sh >output.log 2>error.log

And, if you want to send both regular and error output to the next command on the pipeline you need to merge the output and error before the pipeline, so your commands would take the form:

someScript.sh 2>&1 | someOtherScript.sh

Experimenting with Redirection and Our Scripts

To see all this in action, let’s make use of my sample solution for the previous challenge (pbs149-challengeSolution.sh in the instalment ZIP).

Redirecting a File Into a Script

The < operator is used to send the contents of a file into a script as the standard input, so we’ll redirect the file favouriteOrder.txt in the instalment ZIP into my sample solution. Before we do that, let’s quickly look at its contents:

2
3
1

You’ll notice that corresponds to the menu options pancakes, waffles, and done.

So let’s go ahead and redirect this file into the script:

./pbs149-challengeSolution.sh <favouriteOrder.txt

What you’ll see is that the script behaves as if we had typed 2 the first time we were offered the menu, then 3 the second time, and finally 1 the third time, i.e. we order pancakes, waffles, and then finish our order.

The contents of the file took the place of our keyboard.

Redirecting the Output of a Script to a File

Next, let’s redirect the output from the script to a file named transcript.txt with the > operator (while continuing to use favouriteOrder.txt as the input):

./pbs149-challengeSolution.sh <favouriteOrder.txt >transcript.txt

The first thing you’ll notice is that the terminal is not showing most of the output because standard out has been redirected to the file (cat transcript.txt):

Choose your breakfast (as many items as you like)
Added pancakes to your order
Added waffles to your order


You ordered the following 2 items:
* pancakes
* waffles

Notice that output has been bifurcated, some when to the terminal, some to the file, and none to both.

The docs for the select command reveal the reason for this:

“… the set of expanded words is printed on the standard error output stream …”

In other words, the menu was sent to standard error, and we only sent standard out to the file, so standard error was still connected to the terminal session.

Writing to `STDERR` rather than `STDOUT`

At the moment the question (‘Choose your breakfast …’) is being sent to standard out, but the options to standard error. That’s silly, so let’s also send the question to standard error where the options are going.

To do that we just need to redirect the relevant echo command’s standard output to standard error with the > operator. We could use the long form of the operator (1>&2), but since standard out is the default, we can simplify it to simply:

echo 'Choose your breakfast (as many items as you like)' >&2

Other similar feedback related to the select should be similarly redirected. You’ll find a version of the script with all appropriate echo commands redirected to STDERR in the file pbs150a-menu.sh in the instalment ZIP.

We can run our favourite order through this updated script with:

./pbs150a-menu.sh <favouriteOrder.txt >transcript.txt

And now we get a more useful transcript:

You ordered the following 2 items:
* pancakes
* waffles

Better yet, we can use the command interactively, even when capturing the output in the transcript file. Try it with:

./pbs150a-menu.sh >transcript.txt

So, think carefully about what output your script sends to STDOUT, and what output it sends to STDERR.

Scripts in Pipelines

Now let’s use our script in a pipeline, first, we’ll receive an order from an echo command:

echo -e "2\n3\n1" | ./pbs150a-menu.sh

We can of course then go on to redirect the output from our script to another script or terminal command by adding another pipe, in this case, we can count the lines the script write to standard out with:

echo -e "2\n3\n1" | ./pbs150a-menu.sh | wc -l

You’ll see the answer is 4, but the output to STDERR coming from our script is confusing things.

Sending Errors into Oblivion

Let’s hide the unwanted standard error output in the previous command by redirecting it (file descriptor 2) to the null device (/dev/null):

echo -e "2\n3\n1" | ./pbs150a-menu.sh 2>/dev/null | wc -l

Now our line count is much clearer!

Some Scripting Considerations

File Descriptors are Inherited

When your script calls another terminal command, that terminal command inherits the three standard file descriptors (STDIN, STDOUT & STDERR) from your script. This means that all commands you call that write to STDOUT go to wherever your script is outputting, and, any command you call that tries to read from STDIN will read from wherever your script is getting it standard input.

This is why the read and select in commands in our sample solution worked when we redirected our text file to the script’s input stream.

Reading Streams is Destructive

When you read a character from an input stream it’s gone from that stream! That’s true of all streams, including the standard input stream.

Don’t worry, this does not mean regular files get deleted by reading them! When you redirect a regular file to a stream what happens is that the contents of the file gets copied into the stream, and it’s that copy that gets consumed as your script reads from the stream.

What this does mean is that you only get one shot at reading any stream, so you need to capture and save anything you need later!

Capturing `STDIN`

We’ve already seen that the read command takes its information from STDIN, but it does so in pieces, not all at once. By default, it does so line-by-line, but what’s really going on is that it does so $IFS by $IFS. What read does is copy characters from STDIN to a variable character-by-character until it meets a character equal to the input fields separator, i.e. the POSIX variable $IFS. This means you can use use read to slurp in all the data in STDIN by setting $IFS to an empty string. I’m not a fan of this approach because it means I have to remember to be sure I make my change to $IFS in such a way that I avoid ‘spooky action at a distance’ bugs.

My preferred method for reading all of STDIN into a variable at once, known colloquially as slurping the standard input, is to use the cat command with no arguments. As its man page says, when passed no arguments it writes STDIN to STDOUT. Because cat will inherit our script’s STDIN, it will read what we want it to read, and we can then use the $() syntax to store the command’s output in a variable.

We can see this in action with the simple script pbs150b-naiveShouter.sh in the instalment ZIP:

#!/usr/bin/env bash

# always read STDIN into a variable
theInput=$(cat)

# convert the input to all upper case with the tr command
theOutput=$(echo "$theInput" | tr [:lower:] [:upper:])

# output the upper-cased text
echo "$theOutput"

This script slurps all of STDIN, converts it to upper case using the tr (translate) command, then outputs the upper-case, i.e. shouty text to STDOUT.

Notice how we read the input into $theInput with the simple line:

theInput=$(cat)

We can see this script in action with the command:

echo 'hello world!' | ./pbs150b-naiveShouter.sh

But what happens if we try to run this script without redirecting something into STDIN? The script hangs! It’s waiting for input from the keyboard, and because we’re using cat rather than read, it will keep waiting even after we hit return. In fact, it will keep waiting for input until we enter the EOF (End of File) character, which we can type with ctrl+d. Try it:

./pbs150b-naiveShouter.sh

Intelligently Slurping `STDIN`

That’s clearly suboptimal, so the obvious question you’re probably asking is ‘how do we detect whether or not STDIN has content for us to slurp?’.

I’m sorry to say, if you need your script to be portable, and to work everywhere, you can’t. 🙁

If you know your script will always be run with modern versions of Bash, i.e. never on a Mac (Apple has a philosophical argument against GPLv3), then you can use the exit code from read -t 0 to detect whether or not there is data to be read in STDIN, but alas that doesn’t work in older versions of Bash, like the one Apple still ship in even the very latest macOS versions.

That does not mean we can’t write user-friendly scripts — we can simply copy the behaviour of standard terminal commands like cat and grep.

Both of these commands are examples of a common convention used by most terminal commands that need a stream of input to work on.

They implement a simple two-rule algorithm:

If zero files are specified, read from STDIN.
If one or more files are specified, check each passed file path to see if it’s the string -, and if it is, read from STDIN, otherwise, read from the specified file.

Because this is a very common convention, getopts does not get confused by the lone -. This convention is used to indicate the script should read from STDIN.

To see this in action, let’s update our shouting script to implement the cat convention, and, to support an optional -b flag to request an exclamation point (a bang) be appended to the output.

You’ll find this script in the instalment ZIP as pbs150c-shouter.sh:

#!/usr/bin/env bash

# start by processing the options
bang='' # assume no bang
while getopts ":b" opt
do
    case $opt in
        b)
            bang='!'
            ;;
        ?)
            echo "Usage: $(basename $0) [-b] [FILE|-]..."
            exit 2
            ;;
    esac
done
shift $(echo "$OPTIND-1" | bc)

#
# -- load the text to shout --
#
text=''

# if there are no args, read STDIN
if [[ $# = 0 ]]
then
    text+=$(cat)
fi

# loop over all args and read from files or STDIN as appropriate
for path in "$@"
do
    # determine whether to treat as file or STDIN
    if [[ $path = '-' ]]
    then
        # read from STDIN
        text+=$(cat)
    else
        # read from file
        text+=$(cat "$path")
    fi
done

# convert the text to all upper case with the tr command
shoutedText=$(echo "$text" | tr [:lower:] [:upper:])

# output the upper-cased text and its optional bang
echo "$shoutedText$bang"

We can now see this script behave like cat with the following examples:

# no args means assume STDIN
echo 'hello world' | ./pbs150c-shouter.sh
echo 'hello world' | ./pbs150c-shouter.sh -b

# regular file args are read as files
./pbs150c-shouter.sh menu.txt
./pbs150c-shouter.sh -b menu.txt

# the - can be added to the file list to read STDIN and regular files
echo -e "\n\nhello world" | ./pbs150c-shouter.sh -b menu.txt -

Direct Access to the Keyboard and Terminal

By default, Bash’s various commands for reading input from the user and writing output will use the three standard streams. Most of the time, that’s fine, but what if you really need to write a message to the terminal, or, to read input from the keyboard, regardless of whether or not your script is in a pipeline. Can you access the terminal directly?

Why would you want to do this? A great example is invoking an SQL command on a database server that needs a password. It’s reasonable to redirect a file with SQL statements into the command, and to send the output to a log file and the errors to a separate error log, so that means standard in is coming from the SQL file, and standard out and standard error are going to their separate log files. How can the SQL command tell you it needs a password, and let you enter one? If you try with mysql you’ll see that it does still somehow manage to write the password prompt to your terminal, and to read your password from the keyboard. How?

The POSIX standard specifies that each process should get its own copy of a special file named /dev/tty which is a readable and writeable reference to the terminal session the process is running in. This means you can always read from the keyboard by using </dev/tty, and you can always write to the terminal with >/dev/tty.

Final Thoughts

We’re nearing the end of our Bash journey now — we can write shell scripts that behave just like standard terminal commands. They can play nice in the pipeline, they can accept options in the traditional way, and they’ll run reliably on a very broad range of OSes.

What’s left to do? Mostly, just a little housekeeping!

I want to finish the series by making some things we’ve been doing implicitly explicit, specifically the different kinds of expansions Bash can do, and the exact behaviour of the various kinds of brackets that can be used to group things.

But, before doing that, I want to look at controlling scope in a little more detail, and, at better output formatting with printf rather than echo. That’s where we’ll start next time.

An Optional Challenge

Update your challenge solution from last time so it can ingest its menu in three ways:

Default to reading the menu from a file named menu.txt in the same folder as the script.
Read the menu from STDIN with the optional argument -m -.
Read the menu from a file with the optional argument -m path_to_file.txt

Regardless of where the menu is coming from, always present the menu interactively, i.e., the user always has to choose using the keyboard.

Finally, to simplify your script, feel free to remove the -s (snark) flag added last time.

Better Arguments with POSIX Special Variables and Options Bash Miniseries Printf and More

149. Better Arguments with POSIX Special Variables and Options Programming by Stealth 151. Printf and More

Join the Community

Find us in the PBS channel on the Podfeet Slack.

Podfeet Slack