Logo
Logo

Programming by Stealth

A blog and podcast series by Bart Busschots & Allison Sheridan.

PBS 149 of X: Better Arguments with POSIX Special Variables and Options (Bash)

15 Apr 2023

Before we get started, we want to make sure everyone knows that a link to the shownotes for every episode are in the podcast feed. We also want to make sure you know that you can join in the conversation with other Programming By Stealth learners by going to the Podfeet Slack at podfeet.com/slack and joining the #pbs channel.

In this instalment we’re going to circle back to accepting command line arguments in our scripts, and fill in the advanced feature and best practices we didn’t cover in our first look at the topic.

To do that we need to formally acknowledge something we’ve been using implicitly — the POSIX standard. In particular, the special variables the standard requires all compliant shells (including sh & Bash) support.

We’ll end the instalment by looking at Bash’s built-in support for optional arguments via the getopts keyword.

Matching Podcast Episode

Listen along to this instalment on episode 765 of the Chit Chat Across the Pond Podcast.

You can also Download the MP3

Read an unedited, auto-generated transcript: CCATP_2023_04_15

Episode Resources

PBS 148 Challenge Solution

The optional challenge at the end of the previous instalment was to update our breakfast menu script to it can optionally accept an argument — a limit on the number of items you can order.

Here’s my sample solution, which you’ll find in the instalment ZIP as pbs148-challengeSolution.sh:

#!/usr/bin/env bash

# read the menu
declare -a menu
while read -r menuLine
do
    # skip comment lines
    echo "$menuLine" | egrep -q '^[ ]*#' && continue

    # skip empty lines
    echo "$menuLine" | egrep -q '^[ ]*$' && continue

    # store the menu item
    menu+=("$menuLine")
done <<<"$(cat $(dirname "$BASH_SOURCE")/menu.txt)"

# default to unlimited items, then check if there is a first argument
limit=''
if [[ -n $1 ]]
then
    # validate the argument
    if echo "$1" | egrep '^[1-9][0-9]*$'
    then
        limit=$1
    else
        echo "invalid argument '$1' - must be a whole number greater than 0"
        exit 2 # custom exit code for bad arg
    fi
fi

# create an empty array to hold the order
declare -a order

# present the menu, with a done option
if [[ -z $limit ]]
then
    echo 'Choose your breakfast (as many items as you like)'
else
    echo "Choose up to $limit breakfast items"
fi
select item in done "${menu[@]}"
do
    # skip invalid selections ($item is empty)
    [[ -z $item ]] && continue

    # exit if done
    [[ $item == done ]] && break

    # store and print the item
    order+=("$item")
    echo "Added $item to your order"

    # if we're limiting, check the limit
    if [[ -n $limit ]]
    then
        [[ ${#order[@]} -ge $limit ]] && break
    fi
done

# print the order
echo -e "\nYou ordered the following ${#order[@]} items:"
for item in "${order[@]}"
do
    echo "* $item"
done

This solution doesn’t use anything we’ve not technically covered, but there are two related things I’d like to draw your attention to.

Firstly, I think this is the first time my examples have made use of the -n test for not zero-length, it’s basically the inverse of -z which tests for a length of zero. I first use it to see if an optional first argument ($1) was passed:

if [[ -n $1 ]]

Secondly, I chose to use a blank limit to represent an absence of any limit. The reason is quite simple, it allows both -z and -n to be used to quickly test for the presence or absence of a limit, e.g.:

# if there's no limit
if [[ -z $limit ]]
then
    echo 'Choose your breakfast (as many items as you like)'
else
    echo "Choose up to $limit breakfast items"
fi

# …

# if there is a limit
if [[ -n $limit ]]
then
    [[ ${#order[@]} -ge $limit ]] && break
fi

Special POSIX Variables

Before we can move on to learning how to add traditional terminal-style flags to our own scripts, we need a more detailed understanding of how Bash handles arguments, and before we can do that, we need to formally introduce a concept we’ve already met informally — the special POSIX variables that are available in all POSIX compliant shells, including Bash.

We’ve actually already seen some of these variables, but we’ve not mentioned that they’re bigger than Bash, that they’re actually part of the much more general POSIX standard. Here’s a table of the most important special POSIX variables):

POSIX Variable Description
$PATH A list of locations to search for executables (see Taming the Terminal Part 13)
$HOME 🆕 The path to the current user’s home directory
$IFS 🆕 The Input Field Separator is a powerful but dangerous variable that should be used with extreme caution because it subtly affects the operation of multiple commands, so it’s very prone to the kind of spooky action at a distance bugs that will drive you nuts! We may cover it later in the series.
$? The exit code of the previously executed command
$$ 🆕 The process ID (PID) of the currently running script
$! 🆕 The process ID (PID) of the most recently started background task (we’re going to ignore this one)
$0 🆕 The name of the currently running script
$1$9 The positional arguments
$# 🆕 The number of arguments passed to the script
$* 🆕 A pseudo-array of all the arguments concatenated, then split on spaces (details to follow)
$@ 🆕 A pseudo-array of all the arguments (details to follow)

If you’re curious to learn more about the POSIX standard, this extended tutorial is very comprehensive.

I want to focus on the symbol-style POSIX variables in this instalment, and most especially, the three argument-related ones we’ve not met before — $#, $*, and $@.

The Number of Args ($#)

By far the simplest of these three to understand is $#, it’s just a number, 0 if there were no arguments passed, 1, if there was one passed, and so on. You can see it in action in the very simple script pbs149a-numArgs.sh in the instalment ZIP:

#!/usr/bin/env bash

echo "You passed $# args"

If you call the script without args you’ll see it return 0, if you cll the script with just the single argument 42 you’ll see it returns 1, and if you call the script with a quoted argument, it does the sensible thing:

./pbs149a-numArgs.sh 42 'life universe everything'
# returns: You passed 2 args

The $* & $@ Expansions

These variables are weird! They’re referred to as expansions, because like *.txt they expand out into a list of arbitrarily many arguments. The two are almost the same, but they differ in a subtle but very important way — they split the args up differently. $* also has the extra complexity that it’s behaviour is altered by the input field separator ($IFS), so it’s vulnerable to spooky action at a distance. To see how they differ, let’s use a simple demo script (pbs149b-argsExpansions.sh in the instalment ZIP):

#!/usr/bin/env bash

echo "You passed $# args"
echo ''
echo 'Un-quoted $* expands them to:'
for a in $*
do
    echo "* $a"
done
echo ''
echo 'Un-quoted $@ expands them to:'
for a in $@
do
    echo "* $a"
done
echo ''
echo 'Quoted $* expands them to:'
for a in "$*"
do
    echo "* $a"
done
echo ''
echo 'Quoted $@ expands them to:'
for a in "$@"
do
    echo "* $a"
done

When we run this script with a simple arg and a quoted arg with spaces like so:

./pbs149b-argsExpansions.sh 42 'life universe everything'

We get the following behaviour for the four ways to use these two variables:

You passed 2 args

Un-quoted $* expands them to:
* 42
* life
* universe
* everything

Un-quoted $@ expands them to:
* 42
* life
* universe
* everything

Quoted $* expands them to:
* 42 life universe everything

Quoted $@ expands them to:
* 42
* life universe everything

Looking at those outputs, the vast majority of the time, it’s the final option, "$@" that gives the behaviour you’ll want 99.9% of the time.

Arguments → Bash Arrays

Finally, you’ll often want to access your arguments as traditional array. You can do that by combining Bash’s array creation syntax with the "$@" POSIX args expansion, i.e. ("$@"), as demonstrated in pbs149c-argsArray.sh in the installment ZIP.

If we run this script with the simple and quoted example args we’ve been using in the previous examples:

#!/usr/bin/env bash

# store the passed args in an array named $args (could be any name)
args=("$@")

# demo array
echo "The array \$args contains ${#args[@]} items:"
for a in "${args[@]}"
do
    echo "* $a"
done

We can see the args have been properly captured in the array, one array element per arg:

The array $args contains 2 items:
* 42
* life universe everything

Command line Options with getopts

With all that ground work laid, we’re ready to use Bash’s built-in getopts keyword to add support for command line options to our scripts.

Don’t confuse Bash’s built-in getopts keyword with the older and more difficult to use getopt command that ships with some Linux/Unix distros.

We can use getopts to handle two distinct types of option:

  1. Boolean Options (present/absent) AKA flags
  2. Optional Arguments which require a value

getopts uses the long-standing tradition of marking both kinds of options using arguments that start with the - symbol. At the terminal level, a boolean option is a single argument, while an optional argument is actually two arguments, the name of the option, and its value.

Flag Options

Especially if you’ve followed Taming the Terminal, you’ll have seen lots of example of command-line flag options, they are simple single letters pre-fixed with a minus, when they’re present one thing happens, when they’re not, something else does.

The example you’ve probably seen most often is the -l flag for the ls command — when absent the files are listed in a multi-column view, when present the files are listed in a list view, one file line per line.

Something else you often see, and that getopts support is flag cuddling, multiple flags can be combined into a single command line argument. Again, ls serves as a common example, what does ls -al mean? It’s actually a cuddled shortcut for ls -a -l, i.e., ls with two flags, -a to show all files, and -l.

Optional Arguments

An optional argument is like a flag, but it has both a single-letter name and a value, so it gets passed as two arguments, the name (prefixed with a -), and the value. An example would be the -p optional argument for specifying a non-standard port when using the ssh command, e.g.:

ssh -p 2222 someuser@some.server

Some people wrongly refer to optional arguments as flags, that’s not correct. What makes a flag a flag is that it has no value, it’s just there or not there.

Some Limitations to getopts

Note that getopts has been around for a very long time, it actually dates back to the original Bourne shell (sh), so it doesn’t have all the modern bells and whistles. Its two biggest shortcoming are:

  1. No support for long options — you can only have 62 single-letter/digit flags (they are case-sensitive), and you only have two obvious choices for words starting with an given letter (e.g. -a and -A for an archive option), so what do you do when you need three flags who’s full meaning starts with the same letter? If your script can optionally add, archive, and append, what do you do? This is why many programming languages now provide support for a new convention, long options, where two minuses are prefixed to multi-letter names, e.g. the git checkout subcommand supports the --track long flag specify that a tracking relationship should be added while checking out the remote branch.
  2. Options must precede regular args — many more advanced programming languages allow for regular arguments and optional arguments to be all mixed up together, some arguments, then some flags, then some more arguments, then an optional argument, etc.. getopts is not that clever, it’s always options then arguments.

Using getopts

The first thing to say is, getopts is a little weird! It processes your arguments one option at a time, so you have to use it in a loop, and it uses three variables to store its state, and its exit code to indicate whether or not its finished.

Let’s start with those three variables:

  1. You need to tell getopts the name of the variable it should store the name of the flag it just processed in. That means all the examples you see on the web will vary slightly as each author chooses their own name for this vital variable. For clarity, I’m going to always use opt, which is a common convention.
  2. The value associated with the most recently found option will always be stored in $OPTARG
  3. The index of the next argument to be processed by getopts is stored in $OPTIND

What arguments does getopts need? The first thing it needs to be told is what options it should consider valid. It accepts that information in the form of a single string. You add each letter you want to accept as an option name to the string, and if the option is an optional argument (as opposed to a flag), you add a : after the option name. Then, just to mess with all our heads, someone decided the : should have a totally different meaning when added as the first character in the string, when you do that, it suppresses the error messages getopts outputs by default. It’s generally considered better to use your own error messages, so I suggest always starting your options list with a :. So, to accept a flag named f and an option named o with errors suppressed, the option string is :fo:. The first : is the no errors please indicator, the f specifies a flag (no value) named f, and the o: an optional argument (has a value) named o.

The only other argument that’s required is the name of the variable you want getopts to store the name (single letter) of the flag or optional argument it just found. Note that this variable will always contain either a single letter, or the symbol ? if the user specified either an unknown option. You need to check for ? if you want to properly handle errors.

So, we have a command that updates three variables, steps through your arguments one option at a time, and needs you to check the contents of a variable to know which option was found. That leads to an obvious design pattern — getopts is always used with a while loop that uses a case conditional to apply the appropriate action.

This is all much harder to describe than to show, so let’s look at pbs149d-getoptsBasic.sh in the instalment ZIP:

#!/usr/bin/env bash

beSnarky='' # assume -s is not passed
worldAdjective='wonderful' # default adjective for the world

# accept a flag named s to request snark, and an optional
# adjective w to describe the world
# save the name of the matched option in $opt (our choice of name)
while getopts ':sw:' opt
do
    case $opt in
        s)
            # enable snark!
            echo 'DEBUG enabling snark'
            beSnarky=1
            ;;
        w)
            # store the the adjective
            echo "DEBUG saving custom adjective '$OPTARG'"
            worldAdjective="$OPTARG"
            ;;
        ?)
            # render a sane error, then exit
            echo "Usage: $(basename $0) [-s] [-w ADJECTIVE]"
            exit 1
            ;;
    esac
done

# assemble the greeting, then print it
greeting=''
if [[ -n $beSnarky ]]
then
    greeting+="Oi you"
else
    greeting+="Hello"
fi
greeting+=" $worldAdjective world"'!'
echo "$greeting"

This script defaults to printing the traditional ‘Hello world’ greeting, but with a twist. Firstly, it uses an adjective to describe the world, which defaults to wonderful, and supports a -w optional argument for specifying an alternative adjective. Secondly, it can become optionally snarky with the -s flag.

The key point to note is that we initialise some variables to a default state, then we process our options to update the state of those variables, then we do what ever it is the script is supposed to do. As described, we use a combination of while, getopts, and case to process the options.

On a side note, I chose to use a boiler-plate error message that uses the POSIX special variable $0 to include the script’s name, stripped of its preceding path with the use of the basename command, in the error message.

You can see the results from calling the script with various arguments below:

./pbs149d-getoptsBasic.sh
> Hello wonderful world!
./pbs149d-getoptsBasic.sh -s
> DEBUG enabling snark
> Oi you wonderful world!
./pbs149d-getoptsBasic.sh -s -w beautiful
> DEBUG enabling snark
> DEBUG saving custom adjective 'beautiful'
> Oi you beautiful world!
./pbs149d-getoptsBasic.sh -a
> Usage: pbs149d-getoptsBasic.sh [-s] [-w ADJECTIVE]

Options and Arguments

We’ve almost learned everything we need to start being productive with getopts, but we’re missing one important piece — how do we handle regular arguments in conjunction with options?

Because options are just arguments with a special naming convention, they appear in the $* and $@ POSIX expansions, and in the $1 to $9 positional argument variables, so if we wanted to add a required argument for the user’s name to our script, the variable that contains the name could be $1 if neither the -s nor -w options were used, $2 if just the -s flag was used, $3 if just the -w optional argument was used, and $4 if both -s and -w were used!

There must be a solution for this mess? Of course there is, we can use the shift built-in keyword to slide all the arguments over by the appropriate amount to effectively discard the options once they’ve been processed. When you call shift with no arguments $*, $@ and the $1 to $9 variables all get updated as if the first argument had never existed. If you pass a number as the first argument, shift performs that task that many times.

We could do the shifting inside the case statement, by calling shift in each flag’s case, and shift 2 in each optional argument’s case, but that would be very messy. Instead, we can make use of the $OPTIND variable which getopts uses to track its position in the argument list. When the while loop finishes $OPTIND will contain the index of the first non-option argument, we need to remove all the arguments that came before that one, so that means we need to remove one less than that index, i.e. after our while loop we need to remove one less than that $OPTIND arguments. Using just the syntax we’ve met to date, we can do that with:

shift $(echo "$OPTIND-1" | bc)

Here we’re using the bc (basic calculator) command to do the math, but there is a more efficient way. For now, please take it on faith that the best way of shifting away the options is with:

shift "$(($OPTIND -1))"

Note that we’ve not yet discussed what (( )) does, which is why this command doesn’t make sense just yet.

Again, this is easier to show than to say, so let’s look at pbs149e-getoptsArgs.sh in the instalment ZIP:

#!/usr/bin/env bash

beSnarky='' # assume -s is not passed
adjective='brilliant' # default adjective for the person

# store the usage string
usage="Usage: $(basename $0) [-s] [-a ADJECTIVE] NAME"

# accept a flag named s to request snark, and an optional
# adjective a to describe the user
# save the name of the matched option in $opt (our choice of name)
while getopts ':sa:' opt
do
    case $opt in
        s)
            # enable snark!
            echo 'DEBUG enabling snark'
            beSnarky=1
            ;;
        a)
            # store the the adjective
            echo "DEBUG saving custom adjective '$OPTARG'"
            adjective="$OPTARG"
            ;;
        ?)
            # render a sane error, then exit
            echo "$usage"
            exit 1
            ;;
    esac
done

# remove the options from the args
shift $(echo "$OPTIND-1" | bc)

# process the remaining args, requiring a name as the first one
name=$1
if [[ -z $name ]]
then
    echo "$usage"
    exit 1
fi

# assemble the greeting, then print it
greeting=''
if [[ -n $beSnarky ]]
then
    greeting+="Oi"
else
    greeting+="Hi"
fi
greeting+=" $adjective $name"'!'
echo "$greeting"

You can see the results from running the scripts with various arguments below:

./pbs149e-getoptsArgs.sh
> Usage: pbs149e-getoptsArgs.sh [-s] [-a ADJECTIVE] NAME
./pbs149e-getoptsArgs.sh Bart
> Hi brilliant Bart!
./pbs149e-getoptsArgs.sh -s Bart
> DEBUG enabling snark
> Oi brilliant Bart!
./pbs149e-getoptsArgs.sh -s -a Brainy Bart
> DEBUG enabling snark
> DEBUG saving custom adjective 'Brainy'
> Oi Brainy Bart!
./pbs149e-getoptsArgs.sh -s -b Brainy Bart
> DEBUG enabling snark
> Usage: pbs149e-getoptsArgs.sh [-s] [-a ADJECTIVE] NAME

A little bonus tip — you can cuddle one optional argument with arbitrarily many flags as long as the optional argument is the last one in the cuddle, e.g. ./pbs149e-getoptsArgs.sh -sa Brilliant Bart works just the same as ./pbs149e-getoptsArgs.sh -s -a Brilliant Bart.

An Important Subtlety — -- Indicates the end of the Options

Note: this sub-section, and the one following, were added in August 2024 based on listener feedback, they are not included in the audio recording.

In the real world, this is a surprisingly rare edge case, but what happens when you use getopts in a script that expects both options and regular arguments and the user wants to pass a regular argument with a value that starts with a - symbol? How does getopts know the user did not mean that thing that starts with a - to be an option?

Well, getopts can’t possibly know what the user means in that situation, so it assumes the intended regular argument is another option, as we can see with our previous example script:

./pbs149e-getoptsArgs.sh -BART-
> Usage: pbs149e-getoptsArgs.sh [-s] [-a ADJECTIVE] NAME

This fails because the -BART- gets interpreted as a list of cuddled boolean options, i.e. it’s interpreted as if we typed:

./pbs149e-getoptsArgs.sh -B -A -R -T --

So what’s the solution? Is it just impossible to pass a regular argument that starts with a - symbol?

Of course not, the developers of getopts thought of this eventuality and provided a solution in the form of a standard option that means there are no more options, the rest are regular arguments. This standard option is --.

So, when ever you need to pass a regular argument with a value starting with a - symbol to a script that uses getopts, you need to explicitly end the options list with the -- option.

Let’s demonstrate that with our previous example script:

# Works because the -- tells getopts to stop looking for options immediately
./pbs149e-getoptsArgs.sh -- -BART-
> Hi brilliant -BART-!

# Works because the -- tells getopts to stop looking for options after the -a option
./pbs149e-getoptsArgs.sh -a Cool -- -BART-
> DEBUG saving custom adjective 'Cool'
> Hi Cool -BART-!

The -- Option is Very Common in Terminal Commands

Because getopts is available as part of the standard POSIX C library, many if not most, terminal commands support/require the -- option too. So when you use getopts in your scripts, you’re not adding exotic behaviour, you’re adding standard behaviour.

Imagine we wanted to create a folder named a with a file named test.txt and we then wanted to list the contents of that folder. We could do so like this:

mkdir a
touch a/test.txt
ls a

This would successfully create the folder, add the file (as an empty file) and we’d see that file listed as the output from the final ls command.

Now let’s imagine that, for some reason, we wanted to name our folder -a instead, you might imagine the following would work:

mkdir -a
touch -a/test.txt
ls -a

But it would not, all three of these standard terminal commands will interpret the -a as an option, not as the first argument. The first two commands don’t have a -a option, so they give errors to that effect, but ls does, so it shows you all the files, including the hidden ones, in the current directory.

To actually create the folder, add the file, and list the contents of that folder, we need to use the -- option with all three commands:

mkdir -- -a
touch -- -a/test.txt
ls -- -a

Final Thoughts

We’re getting very close to being able to write scripts that fit in perfectly on the terminal. We’re using exit codes to indicate success or failure and we’re using flags and optional arguments for customising our scripts behaviours. What we’re still missing is the ability to play nice in the pipeline, that is to say, to allow our scripts to be used with the terminal’s standard IO-redirection feature, or the terminal plumbing as I like to call it. That’s where we’re going next.

An Optional Extra Bonus Challenge

In the meantime, if you want some more Bash practice, update your solution to the previous challenge to convert the optional argument for specifying a limit to a -l optional argument, and add a -s flag to enable snarky output (like the infamous Carrot weather app for iOS does).

Join the Community

Find us in the PBS channel on the Podfeet Slack.

Podfeet Slack