Logo
Logo

Programming by Stealth

A blog and podcast series by Bart Busschots & Allison Sheridan.

PBS 161 of X: Maths, Assignment & String Manipulation (jq)

17 Feb 2024

In the previous instalment we learned to use jq as a scripting language, so we can now create complex jq filters without them becoming unreadable and unmanageable. This sets us up nicely for learning about the many ways in which jq can manipulate numbers, strings, arrays, and dictionaries. This will take a few instalments to get through, but we’ll start with jq’s support for maths, assignment operators in jq, and string manipulation.

Matching Podcast Episode

Listen along to this instalment on episode 787 of the Chit Chat Across the Pond Podcast.

You can also Download the MP3

Read an unedited, auto-generated transcript with chapter marks: CCATP_2024_02_17

Installment Resources

PBS 160 Challenge Solutions

The challenge set at the end of the previous instalment was to update the example jq script of searching the Nobel prizes by laureate name to take two additional arguments, an earliest allowable year, and a latest allowable year.

You’ll find the sample solution in the instalment ZIP as pbs160-challengeSolition-1.jq:

# Search the Nobel Prizes data set as published by the Nobel Prize Committee
# by name and year.
# Input:    JSON as published by the Nobel Committee
# Output:   An array of prize dictionaries
# Variables:
#   $search:    The search string for matching the names
#   $minYear:   The earlist a matching prize can have been awarded
#   $maxYear:   The latest a matching prize can have been awarded
[
    .prizes[]
    | select(
        any(.laureates[]?;
            "\(.firstname) \(.surname)"
            | ascii_downcase
            | contains($search | ascii_downcase)
        )
        and (.year | tonumber) >= ($minYear | tonumber)
        and (.year | tonumber) <= ($maxYear | tonumber)
    )
]

This solution is almost entirely the same as the final example in the previous instalment (pbs160b-2.jq), but with the addition of two more conditions inside the select:

and (.year | tonumber) >= ($minYear | tonumber)
and (.year | tonumber) <= ($maxYear | tonumber)

These additional conditions both use the tonumber function before doing comparisons to be sure the comparisons really are numeric, and not lexical (alphabetic).

The challenge offered bonus credit for making the minimum and maximum year arguments optional. You’ll find a sample solution for extra credit it in the instalment ZIP as pbs160-challengeSolition-2.jq:

# Search the Nobel Prizes data set as published by the Nobel Prize Committee
# by name and year.
# Input:    JSON as published by the Nobel Committee
# Output:   An array of prize dictionaries
# Variables:
#   $search:    The search string for matching the names
#   $minYear:   The earlist a matching prize can have been awarded (optional)
#   $maxYear:   The latest a matching prize can have been awarded (optional)
[
    .prizes[]
    | select(
        any(.laureates[]?;
            "\(.firstname) \(.surname)"
            | ascii_downcase
            | contains($search | ascii_downcase)
        )
        and (($ARGS.named | has("minYear") | not) // (.year | tonumber) >= ($ARGS.named.minYear | tonumber))
        and (($ARGS.named | has("maxYear") | not) // (.year | tonumber) <= ($ARGS.named.maxYear | tonumber))
    )
]

The only lines that have changed are the two conditions for checking the years:

and (($ARGS.named | has("minYear") | not) // (.year | tonumber) >= ($ARGS.named.minYear | tonumber))
and (($ARGS.named | has("maxYear") | not) // (.year | tonumber) <= ($ARGS.named.maxYear | tonumber))

The first point to note is the switch from using $mixYear & $maxYear to using $ARGS.named.minYear & $ARGS.named.maxYear. As explained in the previous instalment, this is necessary for optional arguments because of how jq compiles filters — simply mentioning a non-existent variable anywhere in a jq filter will cause errors!

The second point to note is the use of the has() function to detect the presence of the optional arguments by checking if $ARGS.named has a key named minYear/maxYear.

There are probably infinitely many ways to go about altering the action of the filter depending on whether or not the variable is present, but the approach this sample solution takes is to ensure that when either optional argument is not present the appropriate condition always returns true. This is achieved by first inverting the output of the has() function to effectively turn it into a has not function using the not function, and then using the alternate operator (//). Since the alternate operator only executes the filter to its right when the filter to its left returns empty, null, or false, this has the effect of only executing the comparison when the variable exists. Whenever the variable does not exist has() returns false, which the not converts to true, which the alternate operator returns without running the filter to its right at all. Hence, the check evaluates to true and the non-existent argument is never processed.

Mathematical Operators & Functions

Let’s start nice and simple, the jq language supports the usual suspects in terms of mathematical operators:

Operator Description Example
+ Addition jq -n '1 + 1' produces 2
- Subtraction jq -n '3 - 1' produces 2
* Multiplication jq -n '2 * 1' produces 2
/ Division jq -n '4 / 2' produces 2
% Modulo (remainder) jq -n '5 % 3' produces 2

Note the use of the -n flag in the examples to tell the jq command not to expect any input to process.

Something to notice is that the jq language does not support the increment and decrement operators ++ and --.

As well as these operators, the jq language also provides built-in functions for performing arithmetic.

To make the examples shorter to write and easier to interpret we’ll use a pair of JSON files from the instalment ZIP in our examples. First, we’ll use numbers.json which contains a single top-level array of numbers:

[
	-42,
	0,
	3.1415,
	11
]

And we’ll also use menu.json which contains a top-level array of dictionaries, each containing keys that are numeric:

[
	{
		"name": "hotdogs",
		"price": 5.99,
		"stock": 143
	},
	{
		"name": "pancakes",
		"price": 3.10,
		"stock": 43
	},
	{
		"name": "waffles",
		"price": 7.50,
		"stock": 14
	}
]

Note that some of these functions can tolerate having non-numbers sent to them, but many will throw errors for at least some non-numeric data types, so to be safe, only send numbers through these functions. Another thing to watch out for is that even when non-numeric values don’t cause errors, they can produce unexpected results, e.g. the absolute value of a string representation of a negative number is the same string, unchanged — jq -n '"-9.999" | abs' outputs "-9.999".

jq Function Description Example
abs Outputs the absolute value of the input. jq '.[] | abs' numbers.json outputs 42, 0, 3.1415 & 11
floor Rounds input decimal numbers down to the integer part. jq -n '3.1415 | floor' outputs 3, and jq -n '9.999 | floor' outputs 9, but jq -n '-9.999 | floor' outputs 10
sqrt Outputs the square root of the input. jq -n '16 | sqrt' outputs 4
min Outputs the minimum value from an input array of numbers. jq '. | min' numbers.json outputs -42
max Outputs the maximum value from an input array of numbers. jq '. | max' numbers.json outputs 11
min_by(.KEY_PATH) Outputs the dictionary in an input array of dictionaries that has the minimum value for a given key. jq -c '. | min_by(.stock)' menu.json outputs {"name":"waffles","price":7.50,"stock":14}
max_by(.KEY_PATH) Outputs the dictionary in an input array of dictionaries that has the maximum value for a given key. jq -c '. | max_by(.price)' menu.json outputs {"name":"waffles","price":7.50,"stock":14}

Those are the arithmetic functions provided by the jq language itself, but the jq command has a mathematical trick up its sleeve! All the standard 1, 2 & 3 argument C arithmetic functions from C’s standard math library can be used in jq.

For reason’s I can’t quite fathom, the way in which one-argument C functions and two or three-argument C functions get mapped to jq are not consistent. For one-argument C functions the input to the jq function gets passed as the C function’s argument. But, for two and three argument C functions all arguments need to be specified in jq.

The exact functions available will depend on the version of the C libaries installed on the computer, but some widely available useful ones of note are:

C Maths Function Description Example
ceil Round an input decimal number up to the nearest integer. An obvious companion for jq’s built-in floor and C’s round functions. jq -n '3.1415 | ceil' outputs 4, and jq -n '9.999 | ceil' outputs 10
round Round an input decimal number up or down to the nearest integer. An obvious companion for jq’s built-in floor and C’s ceil functions. jq -n '3.1415 | round' outputs 3, and jq -n '9.999 | round' outputs 10
pow Raise an input number to the power passed as the first argument. jq -n '2 | pow(.; 3)' outputs 8

ceil, round, and floor are very similar but treat negative numbers in different ways. Let’s look at positive and negative values in each of them for a direct comparison.

Example Returns
ceil:  
jq -n '9.999 | ceil' 10
jq -n '-9.999 | ceil' -9
round:  
jq -n '9.999 | round' 10
jq -n '-9.999 | round' -10
floor:  
jq -n '9.999 | floor' 9
jq -n '-9.999 | floor' -10

Assignment Operators

In most programming languages, assignment operators are used to set the values of variables, but that’s not the case in jq, instead, assignment operators are used to set values within the item currently being processed, i.e. for assigning values within ..

Most programming languages have just one primary assignment operator, and perhaps a few shortcut operators like += to save developers some typing, but jq has two primary assignment operators, as well as some shortcuts. The two primary assignment types are known as plain assignment and update assignment. The primary operators are = for plain assignment, and |= for update assignment.

Regardless of which operators you use, the left-hand side must be the path to a value within ., and the right-hand side will always be a new value to save at that path. Paths are expressions like .year, .laureates[0] etc.

The left-hand-side of an assignment operator has to be a path within ., e.g. .year to alter the year key of the dictionary currently being processed. The right-hand-side must be an expression that produces the value to be assigned.

Plain Assignment (=)

The plain assignment operator does what you would expect, inserts the value to the right of the operator into the item currently being processed at the path to the left of the operator.

So, to set a new price for hotdogs (first item on the menu) in our menu (menu.json) we would simply use the filter:

.[0].price = 4.20

You can see this in action with the following command:

jq '.[0].price = 4.20' menu.json

Which outputs:

[
  {
    "name": "hotdogs",
    "price": 4.20,
    "stock": 143
  },
  // 
]

Notice the price has been changed from 5.99 to 4.20.

You can use . on the right-hand side of a plain assignment, and it has the same value as it does on the left-hand side, i.e. the item currently being processed. So, we could set the price of a hotdog to be 10% of the amount of remaining stock with the following filter:

.[0].price = .[0].stock / 10

We can see that in action with the command:

jq '.[0].price = .[0].stock / 10' menu.json

Which produces the output:

[
  {
    "name": "hotdogs",
    "price": 14.3,
    "stock": 143
  },
  // 
]

We can of course also set a new price that is dependent on the old price, say a dollar reduction with the filter:

.[0].price = .[0].price - 1

This is very repetitive, there must be a better way?

Update Assignment

Update assignments are specifically designed for situations where the new value depends directly on the old value. To avoid the kind of repetition we saw in the previous example, with all update assignments, the value of . is the current value stored at the path (not the item currently being processed like with plain assignment).

In other words, if we update .price, then the value for . is not the entire dictionary, but just the current value for .price.

To see the power of this, let’s update our example above to explode and recombine our menu, and let’s reduce all our prices by 1 dollar. We can do that with the filter:

.[]
| [
  .price |= . - 1
]

We can collapse that filter onto a single line and run it with the following command:

jq '.[] | [.price |= . - 1]' menu.json

This gives us the output:

[
  {
    "name": "hotdogs",
    "price": 4.99,
    "stock": 143
  }
]
[
  {
    "name": "pancakes",
    "price": 2.1,
    "stock": 43
  }
]
[
  {
    "name": "waffles",
    "price": 6.5,
    "stock": 14
  }
]

Notice that every item’s price is reduced by 1.

Plain -v- Update Assignments

I can’t over emphasise the importance of understanding the difference between these two assignment types, so let’s look at another example to cement the point.

Consider the following simple dictionary:

{
  "breakfast": "pancakes",
  "lunch": "BLT",
  "dinner": "pizza"
}

If that dictionary were an input to a plain assignment, then the value of . on both the left and right-hand sides would be that full dictionary, so you could set the value for dinner to become the same as the value for breakfast with the assignment .dinner = .breakfast. The result of this assignment would be to change the input dictionary to become:

{
  "breakfast": "pancakes",
  "lunch": "BLT",
  "dinner": "pancakes"
}

We can see this for ourselves with the following jq command:

# -n tell jq not to expect any input
jq -n '{breakfast: "pancakes", lunch: "BLT", dinner: "pizza"} | .dinner = .breakfast'

If we send the same dictionary to the update-assignment .dinner |= .breakfast, we get an error, why? Because with an update assignment, the value of . is the currenty value of .dinner, because update assignments are designed for use when you want the new value to depend on the old value.

So, if we wanted to append to our dinner we could do it the long way with, .dinner = "\(.dinner) and nachos", or, we could avoid the need to specifiy the path twice and simple use .dinner |= "\(.) and nachos", as demonstrated by this jq command:

jq -n '{breakfast: "pancakes", lunch: "BLT", dinner: "pizza"} | .dinner |= "\(.) and nachos"'

To keep the two types of assignment clear in my head, the question I ask myself is “do I want to set a new value, or update the existing value”. If I want to update the existing value, I use an update assignment, and . becomes the current value, which seems like a sensible decision by the jq developers given the problem to be solved.

The Update Assignment Operators

As we’ve already seen, the generic update assignment operator is |=, but jq provides a suite of additional update operators that are in effect simply shortcuts to make your filters easier to write and read.

Operator Description Shortcut for
+= Increment by … |= . +
-= Decrement by … |= . -
*= Multiply by … |= . *
/= Divide by … |= . /
%= Modulous … |= . %
//= Alternate assignment |= . //

While jq does not have increment or decrement operators (++ & --in most languages), we can at least use += 1 and -= 1 instead, as shown in this example:

jq -nc '{a: 2, b: 2} | .a += 1 | .b -= 1' # outputs {"a":3,"b":1}

Finally, the alternate assignment operator (//=) can be useful for assigning default values for dictionary keys that may or may not be present. For example, our example Nobel Prizes data set (NobelPrizes.json in the instalment ZIP) does not have a laureates array in the dictionaries representing prizes that were not awarded. We could add an empty laureates array into every un-awarded prize with the command:

jq '[.prizes[] | .laureates //= []]' NobelPrizes.json

Always ‘by Value’, Never ‘by Reference’

In many modern programming languages, including JavaScript, complex data types are passed by reference while simple values are passed by value.

This fact trips up many novice programmers because it can cause spooky action at a distance. For example, consider the following JavaScript code. What will get logged to the console?

let a = [1, 2, 3];
let b = a;
a.push(4);
console.log(b);

If JavaScript assigned b to be a copy of a then b would have a’s original value, [1, 2, 3], but that’s not what happens, because in Javascript a does not contain the array, it contains a reference to the array, so when b gets assigned to be the same as a it becomes a reference to the same array, so the log statement will output [1, 2, 3 , 4] (as you can see for yourself using your browser’s JavaScript console)!

But, unlike JavaScript, jq always copies by value, never by reference.

So, in jq, when you assign the value of .a to .b, then update .b, there is no spooky action at a distance. This is demonstrated by the following simple jq script (pbs161a.jq in the instalment ZIP):

# Demonstrate that jq uses pass by value rather than pass by reference
# Input:    NONE
# Output:   A dictionary with keys a & b

# start with a dictionary with one key, a, that is an array
{
    a: [1, 2, 3]
}

# update the dictionary with a new key b that is a copy of a
| .b = .a

# update a by adding a 4th value to the array
| .a[3] = 4

We can run this script with the command:

# -f tells jq to read its filter from a file
# -c tells jq we want compact output
jq -n -f pbs161a.jq -c

When we do, it produces the following output (reflowed to save space):

{"a":[1,2,3,4],"b":[1,2,3]}

This shows that unlike in JavaScript, in jq, modifying .a after setting .b to be equal to .a does not change .b.

Transforming Strings

As well as allowing us to transform numbers, the jq languages provides many mechanisms for transforming strings.

‘Adding’, ‘Multiplying’ & ‘Dividing’ Strings (Operator Overloading)

The jq language has, to use programming jargon, overloaded some of its operators, including the addition (+) operator. Overloaded operators do different things when presented with different types of data to operate on. Many programming languages use overloading, and while we may not have called it out by name when we explored JavaScript, we did see operator overloading when we used + both for arithmetic addition and for string concatenation.

Note that the jq language’s support of operator overloading is basic compared to what we saw with JavaScript. JavaScript’s operator overloading can handle different data types on each side of the operator, and then automatically casts (i.e. converts) one of the two to match the other before applying the appropriate action. This requires complex rules for determining what to do for every possible combination of types. In jq things are much simpler. With just a few specific exceptions, the types on each side of the overloaded operators must match!

Bringing this back to strings, like in JavaScript, jq’s + operator is overloaded for both arithmetic addition and string concatenation. That is to say when both inputs are numbers they get added, and when both are strings they get concatenated:

# two numbers - arithmetic addition (as above)
jq -n '22 + 20' # outputs 42

# two strings - concatenation
jq -n '"a string" + " another string"' # outputs "a string another string"

# mismatched types
jq -n '"a string" + 42' # throws an error

As we’ll learn later, the + operator is even more overloaded than this – it can also handle arrays and dictionaries!

The multiplication operator (*) is also overloaded, and it is one of those where a specific mismatch of types is explicitly supported. You can multiply a string by a number to concatenate it with itself that many times, in other words, to repeat it that many times:

jq -n '"Ho" * 3' # outputs "HoHoHo"

Finally, the division operator / has been overloaded to work with two strings. When a string is divided by another string, the string to the left is split on the string to the right, and an array is returned. In other words, string division with / is a shortcut for the split function.

jq -nc '"10:20:30" / ":"' # outputs ["10","20","30"]

Built-in String Functions

The jq language provides a selection of built-in functions for manipulating strings, some of the more useful ones are listed below:

Function Description Example
ascii_downcase Convert an input string to lower case. jq -n '"Something" | ascii_downcase' outputs "something"
ascii_upcase Convert an input string to upper case. jq -n '"Something" |ascii_upcase' outputs "SOMETHING"
ltrimstr Remove a given prefix from an input string, or return the string un-changed if the prefix is not present. jq -n '"DEBUG: something" | ltrimstr("DEBUG: ")' outputs "something"
rtrimstr Remove a given postfix from an input string, or return the string un-changed if the postfix is not present. jq -n '"something (DEBUG)" | rtrimstr(" (DEBUG)")' outputs "something"
sub Perform string substitution on an input string to produce the output string. A regular expression should be passed as the first argument, a replacement as the second, and optionally regular expression flags as a third. jq -n '"2023-11-12: Something" | sub("[0-9]{4}-[0-9]{2}-[0-9]{2}: "; "")' outputs "Something", i.e. replaces a leading timestamp with an empty string.

Optional Challenge

Write a jq script that will take as its input the Nobel Prizes data set (NobelPrizes.json) and sanitises it using the various assignment operators to achieve the following changes:

  1. Add a boolean key named awarded to every prize dictionary to indicate whether or not it was awarded.
  2. Ensure all prize dictionaries have a laureates array. It should be empty for prizes that were not awarded.
  3. Add a boolean key named organisation to each laureate dictionary, indicating whether or not the laureate is an organisation rather than a person.
  4. Add a key named displayName to each laureate dictionary. For people, it should contain their first & last names, and for organisations, just the organisation name.

Final Thoughts

In this instalment we made a good start on our exploration of how jq can alter data. We’ll keep the momentum going in the next instalment where we’ll look at how we can alter arrays and dictionaries.

Join the Community

Find us in the PBS channel on the Podfeet Slack.

Podfeet Slack