PBS 163 of X: Lookup Tables & Records (jq)

16 Mar 2024

In my final thoughts at the end of the previous instalment I promised we’d cover everything to do with a special type of dictionary known as a lookup table (sometimes contracted to just lookup) in this instalment, and that the next instalment would be on in-place array manipulation. But, as often happens in this series, as I wrote the notes I realised there were too many new concepts needed to squeeze it all into one instalment, so I’ve split the story of lookups into two. In this instalment we’ll start with some rather philosophical concepts, explaining what lookup tables are, and why we might want to create them, and then we’ll look at building our own lookups with jq. In the next instalment we’ll learn how to de-construct lookup tables, re-index them, and even import them into variables so we can use them to enhance the data we’re primarily processing.

Matching Podcast Episode

Listen along to this instalment on episode 789 of the Chit Chat Across the Pond Podcast.

You can also Download the MP3

Read an unedited, auto-generated transcript with chapter marks: CCATP_2024_03_16

Note: the notes for this instalment were updated after the episode was recorded to clarify the use of ‘lookup’ as a contraction of ‘lookup table’, i.e. as a noun.

Installment Resources

The instalment ZIP file — pbs163.zip

PBS 162 Challenge Solution

The challenge set at the end of the previous instalment was simply to generate an alphabetically sorted list of Nobel Laureates, and for bonus credit, to sort the humans by surname rather than first name.

Let’s start with my sample solution for the basic challenge, which you’ll find in the instalment ZIP as pbs162-challengeSolution-basic.jq:

# Build an alphabetically sorted list of Nobel Prize winners.
# Input:    JSON as published by the Nobel Committee
# Output:   a array of strings

# get all the laureate names
[
    # explode the prizes
    .prizes[]

    # explode the laureates
    | .laureates[]?

    # build and extract the display names
    | ([.firstname, .surname // empty] | join(" "))
]

# sort and de-duplicate the names
# (note that the unique function sorts as well as deduplicates)
| unique

To get the list of names as an array of strings use the basic command:

jq -f pbs162-challengeSolution-basic.jq NobelPrizes.json

To get the list of names as a plain list, use the following command:

# call jq a second time to explode the array
# -r on the second call for raw output
jq -f pbs162-challengeSolution-basic.jq NobelPrizes.json | jq -r '.[]'

This gives us the output:

A. Michael Spence
Aage N. Bohr
Aaron Ciechanover
…
Yves Chauvin
Zhores Alferov
Élie Ducommun

This illustrates a somewhat annoying shortcoming of jq — it doesn’t handle accented characters very elegantly 🙁

Looking at the code, you can see that I first built a list with all laureate names by exploding the prizes array and then exploding the laureates arrays. This gives a series of dictionaries, which is more than we need, so I next extract nicely printed versions of the names using the same logic I used in my sample solution to the previous challenge. At this point, we have a series of individual outputs, but we need an array, so we collect them all together by wrapping everything described so far in a set of square brackets. We now have an array of names as strings.

Our array of names is not sorted, and it definitely contains duplicates because there were quite a number of repeat winners, so there’s a little more work to be done. Because the unique function both sorts and deduplicates, it really is just a little more work, we simply pipe our array of names to unique!

To get the bonus credit we we needed to sort on one value, and display another. My approach was to start similarly to the basic solution, by gathering all the laureates into a single flat array, but instead of extracting the names, I added two new keys to each of the laureate dictionaries, and output the entire dictionaries. I added one new key to store the desired display name, and the other to store the string to sort on. You’ll find my solution in the instalment ZIP as pbs162-challengeSolution-bonusCredit.jq:

# Build an alphabetically sorted list of Nobel Prize winners.
# Human winners are sorted on surname, organisations on organisation name.
# Input:    JSON as published by the Nobel Committee
# Output:   a array of strings

# get all the laureates
[
    # explode the prizes
    .prizes[]

    # explode the laureates
    | .laureates[]?

    # add a field to support sorting
    | .sortBy = ((select(has("surname")) | "\(.surname) \(.firstname)" ) // .firstname)

    # add a field for the name to display
    | .displayName = ([.firstname, .surname // empty] | join(" "))
]

# deduplicate first, or the sorting will get broken
| unique_by(.displayName)

# sort the array on the sortBy key
| sort_by(.sortBy)

# extract the display names
| [
    # re-explode the laureates
    .[]

    # extract just the display name
    | .displayName
]

Again, you can run it directly to get the names as a JSON array, or pipe it through a second call to the jq command to get it as a bare list of strings:

# get the names as a JSON array
jq -f pbs162-challengeSolution-bonusCredit.jq NobelPrizes.json

# get the names as bare strings
jq -f pbs162-challengeSolution-bonusCredit.jq NobelPrizes.json | jq -r '.[]'

That gives us the following output:

Gerardus 't Hooft
Alexei Abrikosov
Jane Addams
…
Bertha von Suttner
Harald zur Hausen
Satoshi Ōmura

Again, this is a somewhat naïve solution because of how jq handles symbols and accented characters, and because our logic for generating the sort string is too simplistic for real-world applications.

Aside: as soon as the previous instalment was published, listener/reader Jill got in touch to point out how difficult sorting on surnames actually is in the real world. There’s literally a whole big spec for how it should be done, as described in this blog post.

Were we to do this for real we would need to sanitise the surname before joining it with the first name by:

Removing some prefixes like 't ', van and von

Sanitise other prefixes like converting Mc to Mac and O' to just O

Then, after appending the first name, we’d need to do another round of sanitations to:

Convert the whole thing to lower case

Substitute all accented characters for their un-accented equivalents

Remove any remaining symbols like -es

You can do all that with jq, and if you were to need to do this often, you could actually define your own jq function named something like toSortName that would take first and surnames as arguments and return sort strings, but that would definitely be some very advanced jq scripting!

OK, getting back to the code — notice that once I have my array of laureate dictionaries, with their two new keys (sortBy & displayName), I first deduplicate them by their display name with unique_by, then I sort them by the key I added for that purpose. Only then, when the array of dictionaries is in the right order, do I explode it to extract the display name, and then assemble just those strings into the final output array.

Finally, I want to draw your attention to how I chose to build the sortBy key:

.sortBy = ((select(has("surname")) | "\(.surname) \(.firstname)" ) // .firstname)

This makes use of the fact that the select function returns either empty or the full dictionary it was passed, and that the alternate operator (//) only actions the right-hand-side if the left-hand-side returns empty, null or false.

So, when there is a surname the select passes the entire laureate dictionary to the pipe into the string interpolation which builds a string with the surname before the first name. That string is not empty, null or false, so the right-hand-side never happens and that string is assigned to the new key.

On the other hand, if there’s no surname, then the select emits zero dictionaries to the pipe to its right, so no string gets constructed, and the left-hand-side evaluates to empty. Because the left-hand-side did not produce a value, the right-hand-side gets executed, and the first name gets assigned to .sortBy.

‘Types’ of Dictionary

Dictionaries are very flexible data structures so they are used to achieve many different ends. This is why computer scientists use conceptual terms like ‘record’ and ‘lookup table’ (or just ‘lookup’) to describe different ways dictionaries get used to store data. These are not rigorously defined terms, but a loose concept. Records and lookup tables are not the only ‘types’ of dictionaries, but they’re the two we need to focus on in this instalment.

Records

Consider an updated version of the menu.json file we’ve used for many of our examples in recent instalments (I’ve added scones):

[
	{
		"name": "hotdogs",
		"price": 5.99,
		"stock": 143
	},
	{
		"name": "pancakes",
		"price": 3.10,
		"stock": 43
	},
	{
		"name": "waffles",
		"price": 7.50,
		"stock": 14
	},
	{
		"name": "scones",
		"price": 7.50,
		"stock": 11
	}
]

This file contains an array of dictionaries, each defining the same keys. This JSON perfectly describes a traditional table with the keys defining the columns, and each dictionary representing a row in the table, or a ‘record’.

In other words, arrays of dictionaries with the same keys can be thought of as records in a traditional database table, or rows in a spreadsheet.

To illustrate the point, here’s the table our file represents:

name	price	stock
hotdogs	5.99	143
pancakes	3.10	43
waffles	7.50	14
scones	7.50	11

So, when the values of the keys in a dictionary map to the names of columns in a conceptual table, we call those dictionaries records.

The Space/Time Tradeoff

Arrays of record-style dictionaries are the least space-inefficient means of storing tabular data in JSON, so you’ll see them a lot. But, they’re very inefficient to search. The only way to find the details in a specific record is to loop over the array of dictionaries. In more traditional programming languages we literally use looping constructs like for, while, or foreach, but in jq we effectively do the same thing but in less obvious ways by exploding arrays and piping the pieces through the select function.

When you have small data sets, you query just once so this inefficiency is negligible, but for larger data sets searched often things go downhill fast, both in terms of processing time and code complexity. Lots of loops or lots of calls to select make for difficult-to-maintain code!

Is there a way to speed up the processing? Of course there is, but it will come with trade-offs. We’re inevitably forced to use space to buy time. We can use the space in RAM or on disk, but something always has to give!

In the world of databases, you trade space for time by adding little mini-tables to your tables called indexes which map column values to row numbers, so you can use an index on, say the name column, to near instantly look up the row number for the record with the name hotdog. Indexes take time to build and update, and disk space to store, so you trade CPU time to manage the index each time you add or update a record and disk space to store the index, in exchange for saving CPU time each time you need to search the table.

When you’re working with JSON data there are no tables or indexes, just arrays and dictionaries, so you need to build index-like data structure from those pieces, and that’s where lookup-type dictionaries come in.

Lookup Tables (or Lookups, or Lookup-type Dictionaries)

A lookup table is a dictionary where the keys are the values for the hypothetical column being indexed. In other words, instead of the keys being column headings like name and price as they are with record-style dictionaries, they are actual names or prices like hotdog and 7.50.

Aside: Note that contrary to what their name might suggest, lookup tables are not necessarily implemented as tables, this may be why some programmers like to contract the name to just lookups. In the world of databases, lookup tables really are tables, but in programming language they are generally implemented as dictionaries.

If your keys are now values for a given column, what are your values? They can either be record-style dictionaries, plain values, or arrays of values. Like there is no single right way to index a database table, there is no single right way to create lookup tables in JSON — it varies each time by the nature of the data being stored and the kinds of queries you need to perform against it. Designing efficient data structures – be that database tables and indexes, or JSON – is a skill, and the only way to learn is to make 10,000 mistakes!

My approach is to leave the structure of the JSON data I receive as-is unless and until I find myself thinking “this is really hard to search”, then I think about how to build one or more lookup tables to simplify things.

First, the most generic kind of lookup table simply indexes records by the most important key – what database people would refer to as the table’s primary key. In the case of our menu that would be the name, so a record lookup would look like:

{
  "hotdogs": {
    "name": "hotdogs",
    "price": 5.99,
    "stock": 143
  },
  "pancakes": {
    "name": "pancakes",
    "price": 3.10,
    "stock": 43
  },
  "waffles": {
    "name": "waffles",
    "price": 7.50,
    "stock": 14
  },
  "scones": {
    "name": "scones",
    "price": 7.50,
    "stock": 11
  }
}

To save space you could remove the name from the records:

{
  "hotdogs": {
    "price": 5.99,
    "stock": 143
  },
  "pancakes": {
    "price": 3.10,
    "stock": 43
  },
  "waffles": {
    "price": 7.50,
    "stock": 14
  },
  "scones": {
    "price": 7.50,
    "stock": 11
  }
}

When you need to read one specific value over and over, you may want a special-purpose lookup table, like for example, prices by food:

{
  "hotdogs": 5.99,
  "pancakes": 3.10,
  "waffles": 7.50,
  "scones": 7.50
}

There is a one-to-one mapping from name to price, but not all indexes are one-to-one. This is when you need lookup tables which map to arrays of values. For example, names indexed by price:

{
  "3.10": [
    "pancakes"
  ],
  "5.99": [
    "hotdogs"
  ],
  "7.50": [
    "waffles",
    "scones"
  ]
}

When are Lookup Tables Worth it?

As a general rule, data that you update infrequently but query regularly is a good candidate for building lookup tables. You’re choosing to trade a little CPU each time the data updates and some disk space permanently in exchange for simpler queries that are easier to write and maintain, and, faster.

In my professional life, I find lookups useful in these scenarios:

Data on a Schedule: A lot of enterprise systems do daily, twice-daily or hourly exports with a scheduled task. Those exports often already write the data in multiple formats like CSV and JSON records list, so adding a few useful JSON lookup tables makes a lot of sense.
Mostly Static Data: Lots of data only updates a few times a year. Our most-used example file illustrates this well. How often are Nobel Prizes awarded? Within an organisation you probably have lots of similar data. Your org structure, your network topology, the domains you own and what you use them for… A good approach here is to keep one canonical copy in the format that’s easiest to manage and keep a shell script right next to it that transforms that data into all kinds of useful formats, including JSON lookup tables.
Complex Data Processing Scripts: Even with very dynamic data, it often makes sense to build temporary lookup tables that will be stale the moment the task finishes. I write a lot of scripts that need to validate, reconcile, or sync data, and they all follow the same design pattern:
1. fetch all the data
2. validate all the data and warn about and discard all dirty data
3. build all the needed lookup tables
4. do the actual work
Most of these lookups only ever exist as variables in RAM, but I do make use of caching in some scripts that run often with data that’s good for a known period of time. My approach here is:
1. before fetching a fresh copy of the data, check for a JSON file on disk
2. if there is a JSON file, and it’s new enough, use it and don’t fetch a fresh copy, if the JSON file is too old, delete it and fetch a fresh copy

With modern CPUs, the efficiency argument feels weak, but I argue strongly that making queries easier to write, read, debug, and maintain is invaluable, and that alone justifies creating lookup tables for me.

How jq Approaches Lookup Tables

Before describing the approach the creators of jq took, I want you to put yourself in their shoes. The problem to be solved is to design a standard jq function that can transform any of the infinity of possible JSON data structures into any of an infinity of possible lookup tables. That’s quite the complex problem!

Faced with these kinds of complex problems, there are a number of possible avenues open to developers.

One approach is to build a mega function that takes many possible optional arguments, or, build a large suite of simpler functions, and hope you capture at least most of the possible scenarios most developers will face most of the time. This is how we got the massively complex find terminal command which tries to provide an optional argument for most types of searches most users need most of the time. One look at the man page for this command makes the trade-off with this approach pretty obvious — so much complexity most users just give up! Thankfully this is not the approach the jq developers took.

Another common approach is to invent or adopt a domain-specific language. Your functions are simple, take just a few arguments, one of which is a string that lets the user express their need in the appropriate domain-specific language. A good example of this approach is the complex problem of pattern matching, where many programming/scripting languages choose to adopt the domain-specific language of regular expressions to simplify their functions. This is in fact how the jq developers choose to tackle pattern matching. Jq provides a few relatively simple functions that expect PCRE-formatted regular expressions as arguments. This is not how they chose to handle the lookup problem though, and with very good reason — there is no equivalent widely used domain-specific language for describing mappings between data structures and lookup tables.

The commonly used approach the jq developers did choose is to use a simple, well defined intermediate data format. Instead of one or more complex functions that go straight from any possible JSON input to any possible lookup table, make jq users transform the input JSON into a simple and clearly defined data structure first, and then pipe that to a very simple jq function. This is a great solution for jq, because transforming JSON is what jq is all about!

jq ‘Entries’

The intermediate format jq’s lookup-related functions use is a simple array of dictionaries, which the documentation describes as entries. A list of entries is an array of dictionaries, each containing two keys — one named key, and one named value. Each entry-type dictionary defines one key-value pair, or entry, in a lookup.

One important subtlety is that the value for the key named key in all entry-type dictionaries must be a string.

For example, we can represent our price lookup by name as the following list of entries:

[
  {
    "key": "hotdogs",
    "value": 5.99
  },
  {
    "key": "pancakes",
    "value": 3.10
  },
  {
    "key": "waffles",
    "value": 7.50
  },
  {
    "key": "scones",
    "value": 7.50
  }
]

Building Lookup Tables

So, to build a lookup-type dictionary with jq we first use some jq filters to build an array of entry-type dictionaries, then we pipe that array to the from_entries function which builds the lookup table.

To illustrate how lookup tables are built, we’ll create the example lookups we used to illustrate the concept in the previous sections.

One-to-One Lookup Tables with `from_entries`

The from_entries function really is a very simple function — it needs an array of entry-type dictionaries as its input, it takes no arguments, and it outputs a lookup-style dictionary. The only subtlety you need to remember is that the values of the keys named key in your entries list must all be strings.

To show all this in action, let’s start by building a very simple lookup table that maps item names to prices in our menu. This is an example of the simplest kind of lookup — a one-to-one mapping between plain values, strings to numbers in this case.

First, we need to use jq’s object construction syntax to build our entries. To do that we need to explode our top-level array of record-type dictionaries, then pipe each one to a filter that builds the entry-type dictionaries:

.[] | {key: .name, value: .price}

Next, we need to reassemble these new entry-type dictionaries into an array using jq’s array construction syntax. In other words, we need to wrap all that in square brackets:

[
	# explode the top-level array of records
	.[]
	
	# build the entries mapping name to price
	| {
		key: .name,
		value: .price
	}
]

We can see what our entries list looks like at this stage with the following command:

jq '[.[] | {key: .name, value: .price}]' menu.json

The final step is to pipe our array of entries to the from_entries function. You can see this finished jq script in the file menu-priceByName.jq:

# Build a lookup of prices by menu item name.
# Input:    A JSON array of menu item records
# Output:   A dictionary with prices indexed by item names

# build the entries list
[
	# explode the top-level array of records
	.[]
	
	# build the entries mapping name to price
	| {
		key: .name,
		value: .price
	}
]

# convert the entries list to a lookup
| from_entries

We can see that in action with the following commands:

# see the lookup
jq -f menu-priceByName.jq menu.json

# save the lookup in efficient JSON form
# @json to format as minimised JSON
# -r for raw output (not formatted as a single JSON string)
jq -f menu-priceByName.jq menu.json | jq -r '@json' > menu-priceByName.json

Next, let’s build a lookup table to map names to full records. The syntax is almost identical, but instead of using the price (.price) for the value, we need to use the entire record, which is the item currently being processed, i.e. . . You’ll find the jq script to do this in the instalment ZIP as menu-byName.jq:

# Build a lookup of menu items by name.
# Input:    A JSON array of menu item records
# Output:   A dictionary with menu item records indexed by name

# build the entries list
[
	# explode the top-level array of records
	.[]
	
	# build the entries mapping name to full record
	| {
		key: .name,
		value: . # the entire record
	}
]

# convert the entries list to a lookup
| from_entries

We can see this in action with the commands:

# see the lookup
jq -f menu-byName.jq menu.json

# save the lookup in efficient JSON form
jq -f menu-byName.jq menu.json | jq -r '@json' > menu-byName.json

Note that this gives us the complete records indexed by name, including the names, so we have duplication of data which we may not want. We can remove this duplication by piping the entries through a filter that uses the del function to delete the names before we assemble the entries into an array:

# build the entries list
[
	# explode the top-level array of records
	.[]
	
	# build the entries mapping name to full record
	| {
		key: .name,
		value: . # the entire record
	}
	
	# remove the name from the record
	| del(.value.name) # note the record is now in a key named value
]

# convert the entries list to a lookup
| from_entries

We can see this in action with the command:

jq '[.[] | {key: .name, value: .} | del(.value.name)] | from_entries' menu.json

This approach works great for one-to-one mappings, but it breaks down when there are two records that have the same value for the key the lookup is being built on. We can see this for ourselves if we try build a lookup table mapping price (converted to a string) to name:

# build the entries list
[
	# explode the top-level array of records
	.[]
	
	# build the entries mapping price to name
	| {
		key: (.price | tostring), # must be string
		value: .name
	}
]

# convert the entries list to a lookup
| from_entries

To see the problem, runs this filter using the following command:

jq '[.[] | {key: (.price | tostring), value: .name}] | from_entries' menu.json

The resulting JSON shows only three items, but our menu has four:

{
  "5.99": "hotdogs",
  "3.10": "pancakes",
  "7.50": "scones"
}

Because waffles and scones both cost $7.50, and because scones come later in the array, our lookup shows scones, but not waffles.

This is why lookup tables representing one-to-many mappings need to index values to arrays of records or values rather than individual records or values.

One-to-Many Lookup Tables with the Help of `group_by`

Trying to build a one-to-many lookup table based on what we’ve met so far in this series is either impossible or very difficult. I certainly wasn’t able to figure out a solution!

Thankfully, it’s quite common to want to build a one-to-many lookup table, so jq provides the standard function needed to make this not just possible, but easy — group_by.

The way I like to describe the group_by function is that it sub-divides an array into an array of arrays based on the result of a filter — every entry in the original array that evaluates to the same value via the filter goes into the same child array in the output.

The group_by function must be passed an array as its input, you must pass a filter as the only argument, and the output will be an array of arrays. The filter passed as an argument will be executed against each element in the input array in turn. Within the filter, . will be the input array element currently being processed. Usually, the filter will simply be the path to a given key in a dictionary.

That’s more difficult to describe than to show, so let’s group our menu by price with the simple jq filter group_by(.price):

jq 'group_by(.price)' menu.json

This produces the following output:

[
  [
    {
      "name": "pancakes",
      "price": 3.10,
      "stock": 43
    }
  ],
  [
    {
      "name": "hotdogs",
      "price": 5.99,
      "stock": 143
    }
  ],
  [
    {
      "name": "waffles",
      "price": 7.50,
      "stock": 14
    },
    {
      "name": "scones",
      "price": 7.50,
      "stock": 11
    }
  ]
]

Since pancakes and hotdogs have unique prices, they are both in child arrays by themselves, but since waffles and scones have the same price, they are together in a single child array.

This gets us much closer to our desired one-to-many lookup table, now we just need to figure out how to convert this array of arrays into entries ready for piping to from_entries.

# group the top-level array of records by price
group_by(.price)

# convert the array of arrays to an array of entries
| [
	# explode the array of arrays
	.[] 
	
	# build an entry for each child array 
	| {
		# all records in any given child array have the same price
		# so take the price from the first record in the array
		key: (.[0].price | tostring), # force to string
    
    # build an array of names from the child array as the value
    value: [
    	# explode the child array
    	.[]
      
      # extract just the name from each record
      | .name
    ]
  }
]

# convert the entries to a lookup
| from_entries

I’ve added detailed comments in the code snipped above to highlight the important points.

We can see what this filter produces with the command:

jq 'group_by(.price) | [.[] | {key: (.[0].price | tostring), value: [.[] | .name]}] | from_entries' menu.json

As promised, it produces a one-to-many lookup from price to name:

{
  "3.10": [
    "pancakes"
  ],
  "5.99": [
    "hotdogs"
  ],
  "7.50": [
    "waffles",
    "scones"
  ]
}

And this time, the waffles were not lost! 😉

An Example Lookup-Building Script

Realistically, only two of our above example lookup tables are useful — the lookup of full records by name, and the lookup of prices by name (that’s why they got their own .jq files). Were I to be maintaining this menu for real, I would keep the menu.json file as the canonical data source, and edit the data there as needed, then I would use a script to build the two useful lookup tables. Over time you may well forget which file is the canonical one, so my approach is to always use the same naming convention — the file with the base name is canonical, all files derived from it get a post-fix describing what they are, for lookup tables, that will be of the form -xByY unless the lookup is to complete records, in which case I just use -byY. I use the same base names for the jq scripts and the JSON files they produce, but since there may be many shell scripts needed to manage all tasks related to a single canonical data file, I always post-fix a description to my shell scripts.

If that all sounds complicated, I think the resulting file structure speaks for itself:

# The data files
menu.json               # the canonical data file
menu-byName.json        # the records lookup
menu-priceByName.json   # the price lookup

# The Scripts
menu-generateLookups.sh # the shell script to generate the lookups
menu-byName.jq          # the jq script to generate the records lookup
menu-priceByName.jq     # the jq script to generate the price lookup

With all that said, you’ll find the sample script for generating the lookups in menu-generateLookups.sh in the instalment ZIP:

#!/usr/bin/env bash

# generate the two lookups and write them to disk
jq -f menu-byName.jq menu.json | jq -r '@json' > menu-byName.json
jq -f menu-priceByName.jq menu.json | jq -r '@json' > menu-priceByName.json

An Optional Challenge

Build a lookup-type dictionary of Nobel Prize records by year.

The basic structure of the solution should be:

{
  "1901": [
    { PRIZE RECORD }
    …
    { PRIZE RECORD }
  ],
	…
	"2023": [
    { PRIZE RECORD }
    …
    { PRIZE RECORD }
  ]
}

For bonus credit, can you build a two-level lookup by year by category?

The basic structure of the solution should be:

{
  "1901": {
    "chemistry": { PRIZE RECORD },
    …
    "physics": { PRIZE RECORD }
  },
	…
	"2023": {
    "chemistry": { PRIZE RECORD },
    …
    "physics": { PRIZE RECORD }
  }
}

Final Thoughts

Hopefully you now have an appreciation for the power of lookup tables, and can build your own as and when you need them. Because lookups are so powerful, you’ll often find that the raw data you need to process is already in the form of a lookup table, and when that happens, you may need to actually disassemble it before you can process the data. Sometimes you’ll disagree with the data author’s choice of index, in those situations you often don’t just want to disassemble the lookup tables you receive, but re-build them with the key of your choice! Finally, the data you are processing often contains references to external data, in those situations, it can be very useful to be able to import lookup tables from external files into variables, and then use those variables to enrich your outputs. A good example would be a data source that refers to users by just their id. If you have a separate JSON file with a lookup table mapping user ids to user records, then you can use that to replace the IDs in your output with names or email addresses or what ever.

All that is on the agenda for the next instalment.

Altering Arrays & Dictionaries jq Miniseries Working with Lookup Tables

162. Altering Arrays & Dictionaries Programming by Stealth 164. Working with Lookup Tables

Join the Community

Find us in the PBS channel on the Podfeet Slack.

Podfeet Slack