PBS 160 of X: jq as a Programming Language (jq)

17 Feb 2024

xkpasswd-js

Allison here, interrupting the shownotes for an announcement. Helma van der Linden, who you may remember from her hosting of PBS 129 teaching us how to use ESLint, has done something extraordinary. She has ported Bart’s xkpasswd password generation service from perl to JavaScript. This is a project we’ve been talking about doing as a community for a very long time. With Bart’s blessing (which he says she didn’t need because it’s open source), she spent her Christmas holiday getting the code to a minimum-viable product

I had Helma on Chit Chat Across the Pond Lite to tell the story of where we started, and how she accomplished this feat and I think the Programming By Stealth audience will love the conversation.

CCATP #785 — Helma van der Linden on Porting XKPASSWD from Perl to JavaScript

If you’d like to give the very beta version of the new tool a try (without knowing any coding), check it out at bartificer.github.io/xkpasswd-js/. In a few days, Bart will have it up as the beta version of the real xkpasswd at beta.xkpasswd.net. This beta version is not feature-complete, but it allows you to create 1-10 passwords that use the default preset from the original xkpasswd. You can’t choose different presets, and you can’t make customized passwords, but at least it does create long, strong, memorable, and typable passwords. And it’s REALLY pretty!

We end with the call for others to come help work on the code. The GitHub repo is at github.com/bartificer/xkpasswd-js. If you have or create a GitHub account, you can contribute to the project. If you don’t have programming skills but you have feature requests, it counts as contributing if you use the “issues” tab for the GitHub project to post your feature request.

Now we’ll go back to our regularly scheduled programming with Bart!

In this instalment I want to take a break from learning about new ways the jq language can process data, and learn about some more jq command and jq language features designed to make it easier to solve complex problems. So far we’ve been writing all our jq filters as single arguments to the jq terminal command. This means we need to write our filters on single lines, no matter how complex they are. This simply does not scale! So, before we go on to create even more complex jq filters, we need to pause to learn some better techniques for developing filters.

Matching Podcast Episode

Listen along to this instalment on episode 787 of the Chit Chat Across the Pond Podcast.

You can also Download the MP3

Read an unedited, auto-generated transcript with chapter marks: CCATP_2024_02_07

Installment Resources

The instalment ZIP file — pbs160.zip

PBS 159 Challenge Solutions

The challenge set at the end of the previous instalment was to build a jq command to transform the Nobel Prize data set from NobelPrizes.json in the installment ZIP into a simplified data structure and save it to a new file NobelPrizeList.json in un-pretty-printed JSON format. The simplified file should consist of an array of dictionaries, one for each prize that was actually awarded, and each dictionary should have just the keys year, prize, numWinners, and winners where the latter is an array of names as strings.

To build up to a solution, let’s start by building a command to return all prizes that were actually awarded. We know a prize was awarded when the dictionary for the prize contains an array named laureates, so let’s use the type command inside a select to filter the list of all prizes down to just those that have a laureates array:

jq '.prizes[] | select((.laureates | type) == "array")' NobelPrizes.json

We can now add a final filter to the chain to assemble our desired simplified dictionary, starting with the three simple keys:

jq '.prizes[] | select((.laureates | type) == "array") | {year: (.year | tonumber), prize: .category, numWinners: (.laureates | length)} ' NobelPrizes.json

The final key, the winners array is a little more complicated, but only a little. We need to assemble an array, so we will need to wrap the filter for generating the names in square brackets. Then, we just need to generate the names, which we can do by exploding the laureates array and building the name with string interpolation ([.laureates[] | "\(.firstname) \(.surname)"]). Adding all that into our command it now looks like this:

jq '.prizes[] | select((.laureates | type) == "array") | {year: (.year | tonumber), prize: .category, numWinners: (.laureates | length), winners: [.laureates[] | "\(.firstname) \(.surname)"]}' NobelPrizes.json

We’re getting pretty close — we can see sensible results for prizes won by humans:

{
  "year": 1903,
  "prize": "physics",
  "numWinners": 3,
  "winners": [
    "Henri Becquerel",
    "Pierre Curie",
    "Marie Curie"
  ]
}

Unfortunately, we’re not getting good results for prizes won by organisations:

{
  "year": 1904,
  "prize": "peace",
  "numWinners": 1,
  "winners": [
    "Institute of International Law null"
  ]
}

Since there is no surname field, the string "null" is getting added after the firstname field. We can use the alternate operator (//) inside the string interpolation to add the surname or an empty string — "\(.firstname) \(.surname // "")". Adding that into our command we now have:

jq '.prizes[] | select((.laureates | type) == "array") | {year: (.year | tonumber), prize: .category, numWinners: (.laureates | length), winners: [.laureates[] | "\(.firstname) \(.surname // "")"]}' NobelPrizes.json

This gets us a lot closer to what we want:

{
  "year": 1904,
  "prize": "peace",
  "numWinners": 1,
  "winners": [
    "Institute of International Law "
  ]
}

Now we just need to remove the trialing space, and we can do that with the rtrimstr function — rtrimstr(" "), so our command now looks like this:

jq '.prizes[] | select((.laureates | type) == "array") | {year: (.year | tonumber), prize: .category, numWinners: (.laureates | length), winners: [.laureates[] | "\(.firstname) \(.surname // "")" | rtrimstr(" ")]}' NobelPrizes.json

Each individual output is now correct, even those for organisations:

{
  "year": 1904,
  "prize": "peace",
  "numWinners": 1,
  "winners": [
    "Institute of International Law"
  ]
}

The next problem to solve is that we have a list of dictionaries, not an array of dictionaries. This is easy to fix, simply wrap the entire filter in square braces!

jq '[.prizes[] | select((.laureates | type) == "array") | {year: (.year | tonumber), prize: .category, numWinners: (.laureates | length), winners: [.laureates[] | "\(.firstname) \(.surname // "")" | rtrimstr(" ") ]}]' NobelPrizes.json

We’re very nearly at a full-credit soluition now, we just need to encode it as JSON data set rather than pretty-printed JSON. We can do that by adding a final encoding filter to the chain (@json), enabling raw output with the -r flag, and finally using some terminal plumbing to redirect the output to a file:

jq -r '[.prizes[] | select((.laureates | type) == "array") | {year: (.year | tonumber), prize: .category, numWinners: (.laureates | length), winners: [.laureates[] | "\(.firstname) \(.surname // "")" | rtrimstr(" ") ]}] | @json' NobelPrizes.json > NobelPrizeList.json

The above solution gets full credit, but there was a bonus on offer if, instad of removing the trailing space after it was added, we avoided it ever getting added in the first place. A hint was given that this could be achieved by combining the alternate operator //, the join function, and a function we’d not yet met named empty that takes no arguments and produces absolute no output of any kind.

This is the full documentation for the empty function:

empty returns no results. None at all. Not even null.

It’s useful on occasion. You’ll know if you need it :)

I wasn’t sure if we ever would need it, or where it might fit into our series, but it turns out we do need it, and this is the moment when it proves useful!

Rather than building the name as a string, we can create an array with the firstname and optionaly the surname, and then join that array using a single space as the separator. Creating an array with the firstname and surname is easy — [.firstname, .surname]. The problem is that this will always result in a two-entry array, either two strings, or one string and a null. We need to use the alternate operator in conjunction with the empty function to completely omit the surname from the array if there is no surnameb — [.firstname, .surname//empty]. Finally, we need to join this array with a single space — [.firstname, .surname//empty] | join(" ").

Substituting in this logic to our previous solution we get the following for the bonus credit:

jq -r '[.prizes[] | select((.laureates | type) == "array") | {year: (.year | tonumber), prize: .category, numWinners: (.laureates | length), winners: [.laureates[] | [.firstname, .surname // empty] | join(" ")]}] | @json' NobelPrizes.json > NobelPrizeList.json

Allison solved the problem a bit differently. For the extra credit portion she used the has command to determine whether the laureate had a surname, and used the alternate operator // to use only the first name if there was no surname. This eliminated the need to remove any trailing spaces because the missing surname was never used.

jq '[.prizes[] | select(has("laureates")) | {year: (.year | tonumber), prize: .category, numWinners: (.laureates | length), winners: [.laureates[]? | (select (has("surname")) | "\(.firstname) \(.surname)") // "\(.firstname)"]}]' NobelPrizes.json

Running jq Filters from Files

The first step to treating jq as a ‘real’ programming language is to switch from writting jq filters as one-line arguments, and to save them to files like we would for any other scripting language.

As a first example, I copied the jq filter from the bonus challenge solution, unchanged, into a file named pbs160a-1.jq in the Installment Zip. We can now run it using the --from-file (or -f) flag to read the filter from the file:

jq -f pbs160a-1.jq NobelPrizes.json

Having one hard to read line of text in a file is not really much better than having one hard to read line of text on the terminal, so how can we re-format this example to make it more human-friendly?

Comments

Like in shell scripts, # starts a comment, with the rest of the line being ignored. So, we can add a comment to the top of our file like so:

# This jq script re-factors the Nobel Prizes data set as published by the Nobel
# prize committee into a simpler form.
# Input:    JSON as published by the Nobel Committee
# Output:   Simplified JSON

Code Layout

If you think about the jq syntax, filters are separated from each other by the pipe symbol (|) when you want to chain them, and the comma (,) when you want to simply separate them. That means newline characters have no special meaning when it comes to starting or ending a filter, so you are free to break your code over multiple lines how ever you like. I tend to take the following approach:

Start non-trivial filters (more than just something very simple like a key name as a function argument) on a new line.
Start and end large arrays and dictionaries on new lines, and indent their contents.
Pre-fix filter separators (| & ,) to the start of lines

I have not found an official style guide for jq, so this is not some kind of best practice. Instead, this approach lines up well with the official style guides for two data querying languages I have experience with, Microsoft’s KQL, and Splunk’s SPL.

Applying that logic, I can rewrite our example file like so (included in the instalment ZIP as pbs160a-2.jq):

# This jq script re-factors the Nobel Prizes data set as published by the Nobel
# prize committee into a simpler form.
# Input:    JSON as published by the Nobel Committee
# Output:   Simplified JSON
[
    .prizes[]
    | select((.laureates | type) == "array")
    | {
        year: (.year | tonumber),
        prize: .category,
        numWinners: (.laureates | length),
        winners: [
            .laureates[]
            | [ .firstname, .surname // empty ]
            | join(" ")
        ]
    }
]
| @json

Note that this makes it much easier to see the structure of the dictionaries we are constructing. Also note that when filters are very short, and when it makes their meaning clearer, I do not break them into new lines, even when they are multi-stage pipe-lines, e.g. I keep the definitions for the year and numWinners keys on a single line (year: (.year | tonumber), & numWinners: (.laureates | length),).

As I write this instalment (January 2024), my preferred free and open source code editor for all the programming I do is Microsoft’s VS Code. There are two very useful jq-related plugins I install into VS Code.

The most important one is jq Syntax Highlighting which teaches VS Code how to syntax highlight .jq files.

The second is vscode-jq, a very useful plugin for working with JSON files. This plugin adds support for jq right into the VS Code command pallet. With this plugin installed, when you have a JSON file open you can search it with a jq filter right in VS Code. You simply bring up the command pallet (View → Command Pallet … in the menu, or shift + cmd + P on a Mac), enter the command jq to activate the plugin, then type your jq filter and hit enter again to run it. The results of applying your filter to the open JSON file will appear in VS Code’s output pane.

Debugging

As you build more complex filters, it becomes ever more useful to print debug messages at various points within your pipeline to check your assumptions about the current form of the data. This is where the debug function comes to the rescue. You can insert this function anywhere in your filter chain without disrupting the flow of data because it passes its inputs straight through unchanged, but it also prints the input or a message of your choosing to standard error (STDERR).

Note that Taming the Terminal instalments 15 (‘plumbing’) and 16 (crossing the streams) describe the three default streams and how they can be redirected.

When you call the debug function without arguments it writes a compressed rendering of its input to STDERR in the following format:

["DEBUG:",<input-value>]

We can see this in action by adding a call to debug into the filter chain of the jq command we built in the previous instalment to render friend of the NosillaCast Dr. Andrea Ghez’s Nobel Prize as a string. We’re inserting it just before the string interpolation filter, so it will show the dictionary that will act as the input for that final filter:

jq -r '.prizes[] | .laureates[]? | select(.surname == "Ghez") | debug | "\(.firstname) \(.surname) was awarded her prize for \(.motivation)"' NobelPrizes.json

Running this command creates two outputs, the debug output which was written to STDERR, and the jq command’s regular output which was written to STDOUT:

["DEBUG:",{"id":"990","firstname":"Andrea","surname":"Ghez","motivation":"\"for the discovery of a supermassive compact object at the centre of our galaxy\"","share":"4"}]
Andrea Ghez was awarded her prize for "for the discovery of a supermassive compact object at the centre of our galaxy"

Because both STDOUT and STDERR and connected to the terminal by default, the outputs appear to be the same, but they can be separated using standard terminal redirection, so, if we redirect STDOUT to the file citation.txt we will still see the debug message in the terminal, but the output will go into the text file:

jq -r '.prizes[] | .laureates[]? | select(.surname == "Ghez") | debug | "\(.firstname) \(.surname) was awarded her prize for \(.motivation)"' NobelPrizes.json > citation.txt

If you want to write a custom debug message, you can use the one-argument for of debug to specify what to output.

It’s very common to use string interpolation for this, for example we could use the debug statement debug("We have the following keys: \(. | keys)") to build a string that starts with the text "We have the following Keys: " and then inserts all the keys in the dictionary currently being processed by piping the current value (.) to the keys function which returns an array of all keys in a dictionary. Substituting that debug call into our example above we get:

jq -r '.prizes[] | .laureates[]? | select(.surname == "Ghez") | debug("We have the following keys: \(. | keys)") | "\(.firstname) \(.surname) was awarded her prize for \(.motivation)"' NobelPrizes.json > citation.txt

Which now write the following to STDERR:

["DEBUG:","We have the following keys: [\"firstname\",\"id\",\"motivation\",\"share\",\"surname\"]"]

When you’re working with multiple input files it can be useful to include the name of the file the data currently being processed came from within your debug outputs, this is what the built-in jq function input_filename is for. As an example, you’ll find two files in the installment ZIP, ip-bartb.json & ip-podfeet.json each containing a single top-level dictionary with information about the IP addresses of bartb.ie and podfeet.com. We can process both files at once with the jq command, and the value of input_filename will change as each file is processed:

jq 'debug("processing file \(input_filename), ip is \(.ipAddress)")' ip-*.json > /dev/null

Note that this command ignores standard output by redirecting STDOUT to /dev/null, the computer’s virtual blackhole. This makes it easy to see the output from the debug filter:

["DEBUG:","processing file ip-bartb.json, ip is 37.139.7.12"]
["DEBUG:","processing file ip-podfeet.json, ip is 104.21.34.68"]

Useful Functions for Exploring Data Structures

When working with large or complex data sets, simply sending all the current data to STDERR on one line is not actually that useful, you need to be a bit more targeted in what you ask the debug function to print for you.

This table lists some functions you may find useful when debugging, some we’ve seen before, and some are new.

Function	Description
`length`	Returns the number of key-value pairs in a dictionary, the number of elements in an array, or the length of a string.
`first` & `last` New	Returns the first/last element in an input array, or, if called with a filter as an argumment, the first/last item produced by that filter.
`limit(N, FILTER)` New	Returns an array of up to `N` outputs from the given filter as an array.
`keys` New	Returns the keys in an input dictionary as an array of strings.
`has(KEY_NAME)` New	Returns `true` if an input dictionary contains a given key, or `false`.

When using select to filter an array, I often find it useful to debug the length of the array before and after filtering it, for example:

jq '.prizes | debug(length) | [.[] | select(.year | tonumber < 1950)] | debug(length)' NobelPrizes.json > /dev/null

This will output the following:

["DEBUG:",670]
["DEBUG:",245]

When working with large arrays, I often want to sample just a subset of their contents to see the structure of the elements, or, to check boundary conditions. This is where the first, last, and limit functions come in handy.

As an example, this command uses debug to show the first and last prizes that remain after a call to select:

jq '[.prizes[] | select((.year | tonumber >= 2000 ) and (.year | tonumber < 2010))] | debug(first, last)' NobelPrizes.json > /dev/null

Note that when you pass a filter that contains the and also operator (,), the function runs twice, once for each sub-filter, so we get:

["DEBUG:",{"year":"2009","category":"chemistry","laureates":[{"id":"841","firstname":"Venkatraman","surname":"Ramakrishnan","motivation":"\"for studies of the structure and function of the ribosome\"","share":"3"},{"id":"842","firstname":"Thomas A.","surname":"Steitz","motivation":"\"for studies of the structure and function of the ribosome\"","share":"3"},{"id":"843","firstname":"Ada E.","surname":"Yonath","motivation":"\"for studies of the structure and function of the ribosome\"","share":"3"}]}]
["DEBUG:",{"year":"2000","category":"medicine","laureates":[{"id":"722","firstname":"Arvid","surname":"Carlsson","motivation":"\"for their discoveries concerning signal transduction in the nervous system\"","share":"3"},{"id":"723","firstname":"Paul","surname":"Greengard","motivation":"\"for their discoveries concerning signal transduction in the nervous system\"","share":"3"},{"id":"724","firstname":"Eric","surname":"Kandel","motivation":"\"for their discoveries concerning signal transduction in the nervous system\"","share":"3"}]}]

If you want to get a wider sample of the data, but you don’t want to be overwhelmed, you can use the limit funtion to cap the number of items debugged, for example:

jq 'debug(limit(5; .prizes[]))' NobelPrizes.json > /dev/null

Note that the limit function does not operate on an input array, but instead, on the results from the filter passed as the second argument, so to get 5 elements from the prizes array we need to explode that array.

When working with complex dictionaries, seeing the full contents can be overwhelming, so you can use the keys function to see just the keys a dictionary contains.

For example, to debug the keys in the first Nobel Prize dictionary we can use the command:

jq '.prizes | debug(first | keys)' NobelPrizes.json > /dev/null

This shows us that a typical Nobel Prize dictionary has just three keys:

["DEBUG:",["category","laureates","year"]]

Finally, if you’re just interested in the presence or absence of a single key, you can use the has function.

For example, to check if the first prize has laureates you can run:

jq '.prizes | debug(first | has("laureates"))' NobelPrizes.json > /dev/null

You can also combine has with all to check if every prize has laureates with the following command:

jq '.prizes | debug(all(.[]; has("laureates")))' NobelPrizes.json > /dev/null

Introducing jq Variables

You’ll often find yourself writing a jq filter that does something that could easily be made generic if only you could easily change one value somewhere deep in the filter chain each time you ran your filter.

For example, the majority of the filter to find Dr. Andrea’s prize could be re-used to find Marie Curie’s prizes, if only we could somehow store the search name in a variable. Thankfully, we can!

The jq language does have support for variables, but we’ve ignored them until now because jq is unusual in actively discouraging the use of variables for basic tasks. This is how the jq documentation explains the language’s approach to variables:

Variables are an absolute necessity in most programming languages, but they’re relegated to an “advanced feature” in jq.

In most languages, variables are the only means of passing around data. If you calculate a value, and you want to use it more than once, you’ll need to store it in a variable.

…

In jq, all filters have an input and an output, so manual plumbing is not necessary to pass a value from one part of a program to the next. Many expressions, for instance a + b, pass their input to two distinct subexpressions (here a and b are both passed the same input), so variables aren’t usually necessary in order to use a value twice.

For instance, calculating the average value of an array of numbers requires a few variables in most languages - at least one to hold the array, perhaps one for each element or for a loop counter. In jq, it’s simply add / length - the add expression is given the array and produces its sum, and the length expression is given the array and produces its length.

So, there’s generally a cleaner way to solve most problems in jq than defining variables. Still, sometimes they do make things easier[.]

You can define variables within the body of your filters, but as the docs are at pains to point out, you rarely need to. We will look at how to assign variables within your filters at the very end of the series, but not until then. This is because the same operator (as) is used to define a variable and to loop over a collection of values — yes, jq is a weird language, it sees single variable assignments as single-iteration loops!

Regardless of how you define your variables, In the jq language, all variable names are prefixed with the $ symbol. This means you can never have a variable named x (jq would interpret that as a function name), you must name it $x.

Passing Variables from the Commandline

To solve our problem of making our js scripts more generic, we don’t need to define variables within the body of our filters, we need to pass them into our script from the command line, and the jq command provides a number of optional arguments for doing just that.

We’re going to intentionally ignore the -args and -jsonargs options because they produce so-called positional arguments which are very cumbersome to access within your scripts ($ARGS.positional[0], $ARGS.positional[1] …).

Instead, we’re going to use the --arg and --argjson options to create named variables. Both of these options are a little unusual for those of you familiar with typical teminal commands, having the following two quirky rules:

You use the options multiple times to pass multiple variables
Each optional argument is formed from three low-level shell arguments rather than the more normal two, i.e. --arg NAME STRING and --argjson NAME JSON_STRING

Somewhat confusingly, but with very good reason (because the $ symbol has a meaning on the terminal), the variable names are not prefixed with a $ symbol on the terminal, but they do need to be accessed with their $ prefix within your jq filters.

Note that these options work for simple jq calls where the filter is specified as an argument, and, when using the -f flag to load the filter from a file.

As a simple example, we can pass a variable named $dessert to a filter that simply calls debug to print the value like so:

# -n to signify that jq should not wait for any input
jq -n --arg dessert waffles 'debug("I like \($dessert)")' > /dev/null

Note that the --arg option always treats its value as a string. This means the following inocuous looking command will not behave as expected:

jq -n --arg n 42 'debug("$n is \($n). Is it greater than 100? \($n > 100)")' > /dev/null

The output is:

["DEBUG:","$n is 42. Is it greater than 100? true"]

Why? Because $n does not have the numeric value 42, it has the string value "42", so alphabetically, "42" does sort after "100"!

To pass values of any type, use the --argjson option with a valid JSON as the value (shell escaped of course).

As an example let’s set $n to the numeric value 42 using --argjson and pass it through the same filter:

jq -n --argjson n 42 'debug("$n is \($n). Is it greater than 100? \($n > 100)")' > /dev/null

Now we get the expected result that 42 is not greater than 100:

["DEBUG:","$n is 42. Is it greater than 100? false"]

Optional Variables are Tricky — `$ARGS.named` to the Rescue!

Because of how variables are replaced with their values during script execution, you can’t reference an undefined variable anywhere in your script without an error being thrown.

There are two ways to deal with this reality — error handling, which we won’t meet until later in the series, or the builtin $ARGS.named dictionary.

The $ARGS dictionary always exists, so it can never be undefined, so referencing it can never throw an error. This dictionary also always contains a child dictionary named named, which stores all the named arguments the jq command received from all --arg and --argjson flags. When you pass an argument with --arg x or --argjson x not only does $x get created, but so does $ARGS.named.x. This means that $ARGS.named | has('x') will return true when --arg x or --argjson x are passed, and false otherwise.

Equally important, if you don’t pass --arg x or --argjson x, $x will throw an error, but $ARGS.named.x will simply evaluate to null. So as long as we always use the long form of the variable name, we can make use of optional arguments without learning about error handling by using the following pattern:

$ARGS.named.someArg // "SOME DEFAULT"

For example:

jq -n '$ARGS.named.someArg // "SOME DEFAULT"' # outputs "SOME DEFAULT"
jq -n --arg someArg pancakes '$ARGS.named.someArg // "SOME DEFAULT"' # outputs "pancakes"

Worked Example — Searching Laureates by Name

As a worked example, let’s write a jq script that expects to be passed the named argument search, and then searches for Nobel Prizes won by laureates whose name contains that search string.

We can search for a specific hard-coded name with the following simple script (pbs160b-0.jq):

# Search the Nobel Prizes data set as published by the Nobel Prize Committee
# for prizes won by anyone named Curie.
# Input:    JSON as published by the Nobel Committee
# Output:   An array of prize dictionaries
[
    .prizes[]
    | select(any(.laureates[]?; "\(.firstname) \(.surname)" | contains("Curie")
    ))
]

We can run this script with the command below, and it finds three prizes (the 1935 & 1911 physics prizes, and the 1903 chemistry prize):

jq -f pbs160b-0.jq NobelPrizes.json

Let’s update this script to replace the hard-coded search for "Curie" with a variable named $search instead (pbs160b-1.jq):

# Search the Nobel Prizes data set as published by the Nobel Prize Committee
# by name.
# Input:    JSON as published by the Nobel Committee
# Output:   An array of prize dictionaries
# Variables:
#   $search:    The search string 
[
    .prizes[]
    | select(any(.laureates[]?; "\(.firstname) \(.surname)" | contains($search)
    ))
]

We can run this script with the command below to search for prizes won by a Curie, and it returns the same three prizes, proving the script works:

jq -f pbs160b-1.jq --arg search Curie NobelPrizes.json

Now, what happens if we try to search for "curie" instead? It gives us zero results!

This is a good opportunity to highlight a very common programmer’s trick that works in any language. When you want to search case-insensitively, convert both strings to all lowercase first. In jq we can do that with the ascii_downcase function (which you’ll find in the docs, or, in the next instalment). Let’s update our function to do a case-insensitive containment check (pbs160b-2.jq):

# Search the Nobel Prizes data set as published by the Nobel Prize Committee
# by name.
# Input:    JSON as published by the Nobel Committee
# Output:   An array of prize dictionaries
# Variables:
#   $search:    The search string 
[
    .prizes[]
    | select(any(.laureates[]?;
        "\(.firstname) \(.surname)"
        | ascii_downcase
        | contains($search | ascii_downcase)
    ))
]

If we use this script to search for "curie" with the command below, we get the expected three prizes:

jq -f pbs160b-2.jq --arg search curie NobelPrizes.json

Optional Challenge

Develop a more advanced searching script that expects three variables:

search — a search string to check the laureate names against case-insensitively
minYear — an earliest year a matching prize can have been awarded in.
maxYear — a latest year a matching prize can have been awarded in.

For bonus credit, can you make both year arguments effectively optional?

Final Thoughts

Now that we have learned how to effectively work with complex jq filters by moving them into their own files, adding debug statements, and passing variables, we’re ready to learn more about how to use jq to manipulate data. In the next instalment we’ll learn how to do math with jq, and, how to transform strings. Next we’ll look at transforming arrays and dictionaries, then, we’ll learn about two of jq’s most powerful functions, designed to allow us to edit the contents of arrays and dictionaries in-place, and finally, we’ll finish up with some advanced topics, including defining variables within jq scripts, looping, traditional conditionals, and try-catch-style error handling.

Building Data Structures jq Miniseries Maths, Assignment & String Manipulation

159. Building Data Structures Programming by Stealth 161. Maths, Assignment & String Manipulation

Join the Community

Find us in the PBS channel on the Podfeet Slack.

Podfeet Slack