PBS 155 of X: Introducing JSON Processing from the Shell with jq (jq)
JSON is a commonly used data format that has its origins in the JavaScript syntax. We first met it all the way back in instalment 17, and we’ve bumped into it from time to time since then, for example, as the format used for NodeJS configuration files, and as the format for data returned by many of the web services we can interact with in JavaScript via AJAX.
JSON is a very powerful means of expressing even complex data, but it can be tricky to interact with from the command line or shell scripts because it doesn’t lend itself to being queried by simple text-processing commands like grep
. To work with JSON in a shell, you need a terminal command that truly understands JSON, and that’s where jq
comes in.
Don’t be fooled by the short name, there’s nothing simplistic about jq
! It’s extremely powerful, but, rather like regular expressions, the syntax is very dense. You can do an awful lot with just a few characters, but there’s a lot of nuanced understanding needed to understand what a jq
command really does, or, to learn to use it yourself.
It’s going to take us a few instalments to explore the jq
terminal command. As usual, this won’t be an exhaustive look at everything the command can do, instead, we’ll focus on three key abilities jq
provides:
- Pretty-printing JSON, helping us to read the outputs from things like web services
- Extracting information from JSON
- Transforming JSON data from one form to anther
We’ll explore these features as they relate to three common use cases:
- APIs, particularly web services, that return data in JSON format
- Reading information from simple JSON files like config files
- Querying and processing large JSON-formatted data sets
Matching Podcast Episodes
Listen along to this instalment on episode 778 of the Chit Chat Across the Pond Podcast.
You can also Download the MP3
Read an unedited, auto-generated transcript: CCATP_2023_11_18
Episode Resources
- The instalment ZIP file — pbs155.zip
Prologue — A Very Quick JSON Refresher
This instalment assumes fluency in JSON. If you’ve never met JSON before, don’t worry, it’s quite an intuitive format, and there are lots of online resources available to help you learn, for example:
- I wrote a simple intro to JSON a few years ago — www.bartbusschots.ie/…
- A good tutorial from FreeCodeCamp — www.freecodecamp.org/…
Having said that, here’s a quick reminder of the basics!
Everything in JSON is one of six types:
- A number, e.g.
42
or3.1415
- A string, e.g.
"A boring string"
- A boolean, i.e.
true
orfalse
- The null value, i.e.
null
- An array of zero or more values, e.g.
[true, 42, "some string"]
- A dictionary (AKA object) of key-value pairs, e.g.
{"key1": "value 1", "another_key": 42}
Both arrays and dictionaries can be nested, allowing complex pieces of information to be represented, e.g., the details of a Nobel prize:
{
"year": "2023",
"category": "chemistry",
"laureates": [
{
"id": "1029",
"firstname": "Moungi",
"surname": "Bawendi",
"motivation": "\"for the discovery and synthesis of quantum dots\"",
"share": "3"
},
{
"id": "1030",
"firstname": "Louis",
"surname": "Brus",
"motivation": "\"for the discovery and synthesis of quantum dots\"",
"share": "3"
},
{
"id": "1031",
"firstname": "Aleksey",
"surname": "Yekimov",
"motivation": "\"for the discovery and synthesis of quantum dots\"",
"share": "3"
}
]
}
Introducing jq
This first point of confusion that needs to be addressed is that the name jq is itself overloaded (to borrow a programming term), it has two very different meanings — jq is firstly a language, and secondly the name of a terminal command for processing JSON strings with that language. I like to think of the name as being an acronym for JSON Query but the official website and documentation give no explanation of the name, and universally use it in all lower-case.
In a probably doomed attempt to minimise confusion between the language and the terminal command, I’m going to always use a fixed-width font for the terminal command, and a regular font for the language, so jq is the language, and jq
is the terminal command.
Presumably because JSON is a relatively new format, at least compared to sh
, bash
, or even zsh
, the need for a terminal command to process it is too new for a standard out-of-the-box command to emerge. The jq
command is an open source third-party tool you’ll need to install yourself.
Useful Resources
- The official jq home page — jqlang.github.io/jq/
- The official docs — jqlang.github.io/manual/
- A nice web version of the
jq
man page — manpages.org/… - The language’s wikipedia page — en.wikipedia.org/… (it has a nice summary of the syntax)
- A useful on-line jq playground — jqplay.org/
Installing jq
(and curl
)
Because the jq
command doesn’t ship as standard with the mainstream command shells, you’ll need to install it. That’s generally easiest on Linux where it’s available from most distros standard package manager, e.g. sudo yum install jq
or sudo apt-get install jq
. Things are a little more complicated on Windows and the Mac.
The most common advice for Windows users is to use the Chocolatey package manager. Once you have that installed you can install jq
from a PowerShell terminal running as administrator with the command choco install jq -y
.
Similarly, the most common advice for Mac users is to use the Homebrew package manager. Once Homebrew is installed, you can install jq
with the command brew install jq
.
Because JSON is returned by so many web APIs, our examples will use the curl
command to make web requests from the command line. The curl
command ships as standard on the Mac, as well as on many Linux distributions (or it’s available from the standard package manager). Windows users can get curl
from Chocolatey with the command choco install curl -y
.
Using the jq
Command
Conceptually, the jq
terminal command is very simple — the inputs are JSON strings for processing and a so-called jq filter to apply those JSON strings, and the outputs are one or more other strings that may or may not also be in JSON format.
The jq filter is the jq code for processing the JSON, and it should be the first argument. The JSON to be processed can be read from STDIN
or from one or more files specified as additional arguments. The command does support a few options, but we’ll leave those for another time.
For reasons that defy logic, when Stephen Dolan developed the version of jq back in 2012 he chose to use characters that have special meaning on the command line as part of his jq syntax, most egregiously, the pipe (|
). This means that most jq syntax needs to be quoted when used on the command line. To avoid getting into trouble my advice is to always quote jq syntax when writing jq
commands!
Since we’re going to be focusing on three common uses for JQ, let’s’ start with a simple example of each. You’ll find the JSON files used in the examples in this instalment’s ZIP file.
Example 1 — Pretty-printing JSON from a Web Service
The very useful site freeipapi.com/ offers a web service API that returns information about your IP address in JSON format from the URL https://freeipapi.com/api/json. We can call this API from the terminal using the curl
command like so:
curl https://freeipapi.com/api/json
When you do you’ll see it returns its JSON as a long single-line string, e.g. this is what’s returned when I run the command on my web server:
{"ipVersion":4,"ipAddress":"37.139.7.12","latitude":52.378502,"longitude":4.89998,"countryName":"Netherlands","countryCode":"NL","timeZone":"+02:00","zipCode":"1000","cityName":"Amsterdam","regionName":"Noord-Holland","continent":"Europe","continentCode":"EU"}
This is not very human-friendly, so to see the information nicely formatted we can pipe it through jq
without any jq filter with the command:
curl -s https://freeipapi.com/api/json | jq
What we get now is nicely formatted and syntax-highlighted JSON:
Example 2 — Extracting a Value from a Config File
We can use the jq
command with a jq filter to extract a specific piece of information from a file, for example, we can extract the number of JavaScript dependencies for the this-ti.me
website from is NPM package config file:
jq '.dependencies|length' this-ti.me-package.json
Notice that the jq filter is quoted, and, that it absolutely must be, because it contains the |
symbol.
Example 3 — Processing a JSON Data Set
The file NobelPrizes.json
contains details of all Nobel prizes ever awarded in JSON format. It is in effect a JSON-formatted database, and we can use the jq
command to query that data base to find specific piece of information and assemble them into a desired output.
Before we query this database, let’s use jq
to pretty print the first entry in the database to show the structure of the records with the command jq '.prizes[0]' NobelPrizes.json
:
{
"year": "2023",
"category": "chemistry",
"laureates": [
{
"id": "1029",
"firstname": "Moungi",
"surname": "Bawendi",
"motivation": "\"for the discovery and synthesis of quantum dots\"",
"share": "3"
},
{
"id": "1030",
"firstname": "Louis",
"surname": "Brus",
"motivation": "\"for the discovery and synthesis of quantum dots\"",
"share": "3"
},
{
"id": "1031",
"firstname": "Aleksey",
"surname": "Yekimov",
"motivation": "\"for the discovery and synthesis of quantum dots\"",
"share": "3"
}
]
}
Now, let’s use a complex jq filter to extract just the number of recipients for each prize in 2023 with the command:
jq '[.prizes[] | select (.year == "2023") | {prize: .category, numRecipients: (.laureates|length)}]' NobelPrizes.json
This produces the JSON:
[
{
"prize": "chemistry",
"numRecipients": 3
},
{
"prize": "economics",
"numRecipients": 1
},
{
"prize": "literature",
"numRecipients": 1
},
{
"prize": "peace",
"numRecipients": 1
},
{
"prize": "physics",
"numRecipients": 3
},
{
"prize": "medicine",
"numRecipients": 2
}
]
Notice that we have not only filtered the data set down to just the records we are interested in, we have transformed those records into entirely new records that contain just two key-value pairs each, where none of the keys existed in the original data, and one of the values is copied directly, and the other computed.
We could achieve the same results with a JavaScript function, but it would take us a lot more code than that dense but powerful jq filter!
Pretty Printing JSON with jq
Let’s finish this first instalment by looking at some useful options for controlling how jq
pretty-prints JSON.
To see these different flags in action, you can use the wonderful free weather API at wttr.in
.
To see the current weather conditions where you are in jq'
default pretty-printed format use the command:
curl -s 'http://wttr.in/?format=j1' | jq '.current_condition[0]'
As you can see, by default, when outputting to a terminal, jq
will uses syntax highlighting in its output, i.e. it will add colours to the text, you can disable that behaviour with the --monochrome-output
or -M
option:
curl -s 'http://wttr.in/?format=j1' | jq '.current_condition[0]' -M
The jq
command also defaults to spreading the output across multiple lines, you can stop it doing that with the --compact-output
or -c
flag:
curl -s 'http://wttr.in/?format=j1' | jq '.current_condition[0]' -c
Finally, by default jq
uses two spaces as the indent, you can use tabs instead with the --tab
flag:
curl -s 'http://wttr.in/?format=j1' | jq '.current_condition[0]' --tab
Or, you can use a different number of spaces with the --indent
option:
curl -s 'http://wttr.in/?format=j1' | jq '.current_condition[0]' --indent 4
Final Thoughts
Hopefully I’ve whetted your appetite for the jq
command and its powerful filter syntax. So far we’ve only looked at one of the commands uses in detail — pretty-printing JSON, in the next instalment we’ll move on to using the jq language to extract specific pieces of information from a JSON string, and then we’ll move onto the most powerful and complex part of all, transforming data with jq.