PBS 149 of X: Better Arguments with POSIX Special Variables and Options (Bash)
Before we get started, we want to make sure everyone knows that a link to the shownotes for every episode are in the podcast feed. We also want to make sure you know that you can join in the conversation with other Programming By Stealth learners by going to the Podfeet Slack at podfeet.com/slack and joining the #pbs channel.
In this instalment we’re going to circle back to accepting command line arguments in our scripts, and fill in the advanced feature and best practices we didn’t cover in our first look at the topic.
To do that we need to formally acknowledge something we’ve been using implicitly — the POSIX standard. In particular, the special variables the standard requires all compliant shells (including sh
& Bash) support.
We’ll end the instalment by looking at Bash’s built-in support for optional arguments via the getopts
keyword.
Matching Podcast Episode
Listen along to this instalment on episode 765 of the Chit Chat Across the Pond Podcast.
You can also Download the MP3
Read an unedited, auto-generated transcript: CCATP_2023_04_15
Episode Resources
- The instalment ZIP file — pbs149.zip
PBS 148 Challenge Solution
The optional challenge at the end of the previous instalment was to update our breakfast menu script to it can optionally accept an argument — a limit on the number of items you can order.
Here’s my sample solution, which you’ll find in the instalment ZIP as pbs148-challengeSolution.sh
:
#!/usr/bin/env bash
# read the menu
declare -a menu
while read -r menuLine
do
# skip comment lines
echo "$menuLine" | egrep -q '^[ ]*#' && continue
# skip empty lines
echo "$menuLine" | egrep -q '^[ ]*$' && continue
# store the menu item
menu+=("$menuLine")
done <<<"$(cat $(dirname "$BASH_SOURCE")/menu.txt)"
# default to unlimited items, then check if there is a first argument
limit=''
if [[ -n $1 ]]
then
# validate the argument
if echo "$1" | egrep '^[1-9][0-9]*$'
then
limit=$1
else
echo "invalid argument '$1' - must be a whole number greater than 0"
exit 2 # custom exit code for bad arg
fi
fi
# create an empty array to hold the order
declare -a order
# present the menu, with a done option
if [[ -z $limit ]]
then
echo 'Choose your breakfast (as many items as you like)'
else
echo "Choose up to $limit breakfast items"
fi
select item in done "${menu[@]}"
do
# skip invalid selections ($item is empty)
[[ -z $item ]] && continue
# exit if done
[[ $item == done ]] && break
# store and print the item
order+=("$item")
echo "Added $item to your order"
# if we're limiting, check the limit
if [[ -n $limit ]]
then
[[ ${#order[@]} -ge $limit ]] && break
fi
done
# print the order
echo -e "\nYou ordered the following ${#order[@]} items:"
for item in "${order[@]}"
do
echo "* $item"
done
This solution doesn’t use anything we’ve not technically covered, but there are two related things I’d like to draw your attention to.
Firstly, I think this is the first time my examples have made use of the -n
test for not zero-length, it’s basically the inverse of -z
which tests for a length of zero. I first use it to see if an optional first argument ($1
) was passed:
if [[ -n $1 ]]
Secondly, I chose to use a blank limit to represent an absence of any limit. The reason is quite simple, it allows both -z
and -n
to be used to quickly test for the presence or absence of a limit, e.g.:
# if there's no limit
if [[ -z $limit ]]
then
echo 'Choose your breakfast (as many items as you like)'
else
echo "Choose up to $limit breakfast items"
fi
# …
# if there is a limit
if [[ -n $limit ]]
then
[[ ${#order[@]} -ge $limit ]] && break
fi
Special POSIX Variables
Before we can move on to learning how to add traditional terminal-style flags to our own scripts, we need a more detailed understanding of how Bash handles arguments, and before we can do that, we need to formally introduce a concept we’ve already met informally — the special POSIX variables that are available in all POSIX compliant shells, including Bash.
We’ve actually already seen some of these variables, but we’ve not mentioned that they’re bigger than Bash, that they’re actually part of the much more general POSIX standard. Here’s a table of the most important special POSIX variables):
POSIX Variable | Description |
---|---|
$PATH |
A list of locations to search for executables (see Taming the Terminal Part 13) |
$HOME 🆕 |
The path to the current user’s home directory |
$IFS 🆕 |
The Input Field Separator is a powerful but dangerous variable that should be used with extreme caution because it subtly affects the operation of multiple commands, so it’s very prone to the kind of spooky action at a distance bugs that will drive you nuts! We may cover it later in the series. |
$? |
The exit code of the previously executed command |
$$ 🆕 |
The process ID (PID) of the currently running script |
$! 🆕 |
The process ID (PID) of the most recently started background task (we’re going to ignore this one) |
$0 🆕 |
The name of the currently running script |
$1 … $9 |
The positional arguments |
$# 🆕 |
The number of arguments passed to the script |
$* 🆕 |
A pseudo-array of all the arguments concatenated, then split on spaces (details to follow) |
$@ 🆕 |
A pseudo-array of all the arguments (details to follow) |
If you’re curious to learn more about the POSIX standard, this extended tutorial is very comprehensive.
I want to focus on the symbol-style POSIX variables in this instalment, and most especially, the three argument-related ones we’ve not met before — $#
, $*
, and $@
.
The Number of Args ($#
)
By far the simplest of these three to understand is $#
, it’s just a number, 0 if there were no arguments passed, 1, if there was one passed, and so on. You can see it in action in the very simple script pbs149a-numArgs.sh
in the instalment ZIP:
#!/usr/bin/env bash
echo "You passed $# args"
If you call the script without args you’ll see it return 0
, if you cll the script with just the single argument 42 you’ll see it returns 1
, and if you call the script with a quoted argument, it does the sensible thing:
./pbs149a-numArgs.sh 42 'life universe everything'
# returns: You passed 2 args
The $*
& $@
Expansions
These variables are weird! They’re referred to as expansions, because like *.txt
they expand out into a list of arbitrarily many arguments. The two are almost the same, but they differ in a subtle but very important way — they split the args up differently. $*
also has the extra complexity that it’s behaviour is altered by the input field separator ($IFS
), so it’s vulnerable to spooky action at a distance. To see how they differ, let’s use a simple demo script (pbs149b-argsExpansions.sh
in the instalment ZIP):
#!/usr/bin/env bash
echo "You passed $# args"
echo ''
echo 'Un-quoted $* expands them to:'
for a in $*
do
echo "* $a"
done
echo ''
echo 'Un-quoted $@ expands them to:'
for a in $@
do
echo "* $a"
done
echo ''
echo 'Quoted $* expands them to:'
for a in "$*"
do
echo "* $a"
done
echo ''
echo 'Quoted $@ expands them to:'
for a in "$@"
do
echo "* $a"
done
When we run this script with a simple arg and a quoted arg with spaces like so:
./pbs149b-argsExpansions.sh 42 'life universe everything'
We get the following behaviour for the four ways to use these two variables:
You passed 2 args
Un-quoted $* expands them to:
* 42
* life
* universe
* everything
Un-quoted $@ expands them to:
* 42
* life
* universe
* everything
Quoted $* expands them to:
* 42 life universe everything
Quoted $@ expands them to:
* 42
* life universe everything
Looking at those outputs, the vast majority of the time, it’s the final option, "$@"
that gives the behaviour you’ll want 99.9% of the time.
Arguments → Bash Arrays
Finally, you’ll often want to access your arguments as traditional array. You can do that by combining Bash’s array creation syntax with the "$@"
POSIX args expansion, i.e. ("$@")
, as demonstrated in pbs149c-argsArray.sh
in the installment ZIP.
If we run this script with the simple and quoted example args we’ve been using in the previous examples:
#!/usr/bin/env bash
# store the passed args in an array named $args (could be any name)
args=("$@")
# demo array
echo "The array \$args contains ${#args[@]} items:"
for a in "${args[@]}"
do
echo "* $a"
done
We can see the args have been properly captured in the array, one array element per arg:
The array $args contains 2 items:
* 42
* life universe everything
Command line Options with getopts
With all that ground work laid, we’re ready to use Bash’s built-in getopts
keyword to add support for command line options to our scripts.
Don’t confuse Bash’s built-in
getopts
keyword with the older and more difficult to usegetopt
command that ships with some Linux/Unix distros.
We can use getopts
to handle two distinct types of option:
- Boolean Options (present/absent) AKA flags
- Optional Arguments which require a value
getopts
uses the long-standing tradition of marking both kinds of options using arguments that start with the -
symbol. At the terminal level, a boolean option is a single argument, while an optional argument is actually two arguments, the name of the option, and its value.
Flag Options
Especially if you’ve followed Taming the Terminal, you’ll have seen lots of example of command-line flag options, they are simple single letters pre-fixed with a minus, when they’re present one thing happens, when they’re not, something else does.
The example you’ve probably seen most often is the -l
flag for the ls
command — when absent the files are listed in a multi-column view, when present the files are listed in a list view, one file line per line.
Something else you often see, and that getopts
support is flag cuddling, multiple flags can be combined into a single command line argument. Again, ls
serves as a common example, what does ls -al
mean? It’s actually a cuddled shortcut for ls -a -l
, i.e., ls
with two flags, -a
to show all files, and -l
.
Optional Arguments
An optional argument is like a flag, but it has both a single-letter name and a value, so it gets passed as two arguments, the name (prefixed with a -
), and the value. An example would be the -p
optional argument for specifying a non-standard port when using the ssh
command, e.g.:
ssh -p 2222 someuser@some.server
Some people wrongly refer to optional arguments as flags, that’s not correct. What makes a flag a flag is that it has no value, it’s just there or not there.
Some Limitations to getopts
Note that getopts
has been around for a very long time, it actually dates back to the original Bourne shell (sh
), so it doesn’t have all the modern bells and whistles. Its two biggest shortcoming are:
- No support for long options — you can only have 62 single-letter/digit flags (they are case-sensitive), and you only have two obvious choices for words starting with an given letter (e.g.
-a
and-A
for an archive option), so what do you do when you need three flags who’s full meaning starts with the same letter? If your script can optionally add, archive, and append, what do you do? This is why many programming languages now provide support for a new convention, long options, where two minuses are prefixed to multi-letter names, e.g. thegit checkout
subcommand supports the--track
long flag specify that a tracking relationship should be added while checking out the remote branch. - Options must precede regular args — many more advanced programming languages allow for regular arguments and optional arguments to be all mixed up together, some arguments, then some flags, then some more arguments, then an optional argument, etc..
getopts
is not that clever, it’s always options then arguments.
Using getopts
The first thing to say is, getopts
is a little weird! It processes your arguments one option at a time, so you have to use it in a loop, and it uses three variables to store its state, and its exit code to indicate whether or not its finished.
Let’s start with those three variables:
- You need to tell
getopts
the name of the variable it should store the name of the flag it just processed in. That means all the examples you see on the web will vary slightly as each author chooses their own name for this vital variable. For clarity, I’m going to always useopt
, which is a common convention. - The value associated with the most recently found option will always be stored in
$OPTARG
- The index of the next argument to be processed by
getopts
is stored in$OPTIND
What arguments does getopts
need? The first thing it needs to be told is what options it should consider valid. It accepts that information in the form of a single string. You add each letter you want to accept as an option name to the string, and if the option is an optional argument (as opposed to a flag), you add a :
after the option name. Then, just to mess with all our heads, someone decided the :
should have a totally different meaning when added as the first character in the string, when you do that, it suppresses the error messages getopts
outputs by default. It’s generally considered better to use your own error messages, so I suggest always starting your options list with a :
. So, to accept a flag named f
and an option named o
with errors suppressed, the option string is :fo:
. The first :
is the no errors please indicator, the f
specifies a flag (no value) named f
, and the o:
an optional argument (has a value) named o
.
The only other argument that’s required is the name of the variable you want getopts
to store the name (single letter) of the flag or optional argument it just found. Note that this variable will always contain either a single letter, or the symbol ?
if the user specified either an unknown option. You need to check for ?
if you want to properly handle errors.
So, we have a command that updates three variables, steps through your arguments one option at a time, and needs you to check the contents of a variable to know which option was found. That leads to an obvious design pattern — getopts
is always used with a while
loop that uses a case
conditional to apply the appropriate action.
This is all much harder to describe than to show, so let’s look at pbs149d-getoptsBasic.sh
in the instalment ZIP:
#!/usr/bin/env bash
beSnarky='' # assume -s is not passed
worldAdjective='wonderful' # default adjective for the world
# accept a flag named s to request snark, and an optional
# adjective w to describe the world
# save the name of the matched option in $opt (our choice of name)
while getopts ':sw:' opt
do
case $opt in
s)
# enable snark!
echo 'DEBUG enabling snark'
beSnarky=1
;;
w)
# store the the adjective
echo "DEBUG saving custom adjective '$OPTARG'"
worldAdjective="$OPTARG"
;;
?)
# render a sane error, then exit
echo "Usage: $(basename $0) [-s] [-w ADJECTIVE]"
exit 1
;;
esac
done
# assemble the greeting, then print it
greeting=''
if [[ -n $beSnarky ]]
then
greeting+="Oi you"
else
greeting+="Hello"
fi
greeting+=" $worldAdjective world"'!'
echo "$greeting"
This script defaults to printing the traditional ‘Hello world’ greeting, but with a twist. Firstly, it uses an adjective to describe the world, which defaults to wonderful, and supports a -w
optional argument for specifying an alternative adjective. Secondly, it can become optionally snarky with the -s
flag.
The key point to note is that we initialise some variables to a default state, then we process our options to update the state of those variables, then we do what ever it is the script is supposed to do. As described, we use a combination of while
, getopts
, and case
to process the options.
On a side note, I chose to use a boiler-plate error message that uses the POSIX special variable $0
to include the script’s name, stripped of its preceding path with the use of the basename
command, in the error message.
You can see the results from calling the script with various arguments below:
./pbs149d-getoptsBasic.sh
> Hello wonderful world!
./pbs149d-getoptsBasic.sh -s
> DEBUG enabling snark
> Oi you wonderful world!
./pbs149d-getoptsBasic.sh -s -w beautiful
> DEBUG enabling snark
> DEBUG saving custom adjective 'beautiful'
> Oi you beautiful world!
./pbs149d-getoptsBasic.sh -a
> Usage: pbs149d-getoptsBasic.sh [-s] [-w ADJECTIVE]
Options and Arguments
We’ve almost learned everything we need to start being productive with getopts
, but we’re missing one important piece — how do we handle regular arguments in conjunction with options?
Because options are just arguments with a special naming convention, they appear in the $*
and $@
POSIX expansions, and in the $1
to $9
positional argument variables, so if we wanted to add a required argument for the user’s name to our script, the variable that contains the name could be $1
if neither the -s
nor -w
options were used, $2
if just the -s
flag was used, $3
if just the -w
optional argument was used, and $4
if both -s
and -w
were used!
There must be a solution for this mess? Of course there is, we can use the shift
built-in keyword to slide all the arguments over by the appropriate amount to effectively discard the options once they’ve been processed. When you call shift
with no arguments $*
, $@
and the $1
to $9
variables all get updated as if the first argument had never existed. If you pass a number as the first argument, shift
performs that task that many times.
We could do the shifting inside the case
statement, by calling shift
in each flag’s case, and shift 2
in each optional argument’s case, but that would be very messy. Instead, we can make use of the $OPTIND
variable which getopts
uses to track its position in the argument list. When the while loop finishes $OPTIND
will contain the index of the first non-option argument, we need to remove all the arguments that came before that one, so that means we need to remove one less than that index, i.e. after our while
loop we need to remove one less than that $OPTIND arguments. Using just the syntax we’ve met to date, we can do that with:
shift $(echo "$OPTIND-1" | bc)
Here we’re using the bc
(basic calculator) command to do the math, but there is a more efficient way. For now, please take it on faith that the best way of shifting away the options is with:
shift "$(($OPTIND -1))"
Note that we’ve not yet discussed what (( ))
does, which is why this command doesn’t make sense just yet.
Again, this is easier to show than to say, so let’s look at pbs149e-getoptsArgs.sh
in the instalment ZIP:
#!/usr/bin/env bash
beSnarky='' # assume -s is not passed
adjective='brilliant' # default adjective for the person
# store the usage string
usage="Usage: $(basename $0) [-s] [-a ADJECTIVE] NAME"
# accept a flag named s to request snark, and an optional
# adjective a to describe the user
# save the name of the matched option in $opt (our choice of name)
while getopts ':sa:' opt
do
case $opt in
s)
# enable snark!
echo 'DEBUG enabling snark'
beSnarky=1
;;
a)
# store the the adjective
echo "DEBUG saving custom adjective '$OPTARG'"
adjective="$OPTARG"
;;
?)
# render a sane error, then exit
echo "$usage"
exit 1
;;
esac
done
# remove the options from the args
shift $(echo "$OPTIND-1" | bc)
# process the remaining args, requiring a name as the first one
name=$1
if [[ -z $name ]]
then
echo "$usage"
exit 1
fi
# assemble the greeting, then print it
greeting=''
if [[ -n $beSnarky ]]
then
greeting+="Oi"
else
greeting+="Hi"
fi
greeting+=" $adjective $name"'!'
echo "$greeting"
You can see the results from running the scripts with various arguments below:
./pbs149e-getoptsArgs.sh
> Usage: pbs149e-getoptsArgs.sh [-s] [-a ADJECTIVE] NAME
./pbs149e-getoptsArgs.sh Bart
> Hi brilliant Bart!
./pbs149e-getoptsArgs.sh -s Bart
> DEBUG enabling snark
> Oi brilliant Bart!
./pbs149e-getoptsArgs.sh -s -a Brainy Bart
> DEBUG enabling snark
> DEBUG saving custom adjective 'Brainy'
> Oi Brainy Bart!
./pbs149e-getoptsArgs.sh -s -b Brainy Bart
> DEBUG enabling snark
> Usage: pbs149e-getoptsArgs.sh [-s] [-a ADJECTIVE] NAME
A little bonus tip — you can cuddle one optional argument with arbitrarily many flags as long as the optional argument is the last one in the cuddle, e.g. ./pbs149e-getoptsArgs.sh -sa Brilliant Bart
works just the same as ./pbs149e-getoptsArgs.sh -s -a Brilliant Bart
.
An Important Subtlety — --
Indicates the end of the Options
Note: this sub-section, and the one following, were added in August 2024 based on listener feedback, they are not included in the audio recording.
In the real world, this is a surprisingly rare edge case, but what happens when you use getopts
in a script that expects both options and regular arguments and the user wants to pass a regular argument with a value that starts with a -
symbol? How does getopts
know the user did not mean that thing that starts with a -
to be an option?
Well, getopts
can’t possibly know what the user means in that situation, so it assumes the intended regular argument is another option, as we can see with our previous example script:
./pbs149e-getoptsArgs.sh -BART-
> Usage: pbs149e-getoptsArgs.sh [-s] [-a ADJECTIVE] NAME
This fails because the -BART-
gets interpreted as a list of cuddled boolean options, i.e. it’s interpreted as if we typed:
./pbs149e-getoptsArgs.sh -B -A -R -T --
So what’s the solution? Is it just impossible to pass a regular argument that starts with a -
symbol?
Of course not, the developers of getopts
thought of this eventuality and provided a solution in the form of a standard option that means there are no more options, the rest are regular arguments. This standard option is --
.
So, when ever you need to pass a regular argument with a value starting with a -
symbol to a script that uses getopts
, you need to explicitly end the options list with the --
option.
Let’s demonstrate that with our previous example script:
# Works because the -- tells getopts to stop looking for options immediately
./pbs149e-getoptsArgs.sh -- -BART-
> Hi brilliant -BART-!
# Works because the -- tells getopts to stop looking for options after the -a option
./pbs149e-getoptsArgs.sh -a Cool -- -BART-
> DEBUG saving custom adjective 'Cool'
> Hi Cool -BART-!
The --
Option is Very Common in Terminal Commands
Because getopts
is available as part of the standard POSIX C library, many if not most, terminal commands support/require the --
option too. So when you use getopts
in your scripts, you’re not adding exotic behaviour, you’re adding standard behaviour.
Imagine we wanted to create a folder named a
with a file named test.txt
and we then wanted to list the contents of that folder. We could do so like this:
mkdir a
touch a/test.txt
ls a
This would successfully create the folder, add the file (as an empty file) and we’d see that file listed as the output from the final ls
command.
Now let’s imagine that, for some reason, we wanted to name our folder -a
instead, you might imagine the following would work:
mkdir -a
touch -a/test.txt
ls -a
But it would not, all three of these standard terminal commands will interpret the -a
as an option, not as the first argument. The first two commands don’t have a -a
option, so they give errors to that effect, but ls
does, so it shows you all the files, including the hidden ones, in the current directory.
To actually create the folder, add the file, and list the contents of that folder, we need to use the --
option with all three commands:
mkdir -- -a
touch -- -a/test.txt
ls -- -a
Final Thoughts
We’re getting very close to being able to write scripts that fit in perfectly on the terminal. We’re using exit codes to indicate success or failure and we’re using flags and optional arguments for customising our scripts behaviours. What we’re still missing is the ability to play nice in the pipeline, that is to say, to allow our scripts to be used with the terminal’s standard IO-redirection feature, or the terminal plumbing as I like to call it. That’s where we’re going next.
An Optional Extra Bonus Challenge
In the meantime, if you want some more Bash practice, update your solution to the previous challenge to convert the optional argument for specifying a limit to a -l
optional argument, and add a -s
flag to enable snarky output (like the infamous Carrot weather app for iOS does).