Logo
Logo

Programming by Stealth

A blog and podcast series by Bart Busschots & Allison Sheridan.

PBS 105 — Git: Seeing the Past

The problem we’re trying to solve throughout this entire mini-series is capturing a complete history of our creative projects. So far we’ve learned how to save points-in-time as commits, but what good is saving a snapshot if we can’t look at it?

So far the closest we’ve come to looking into the past is using the git diff command to see the differences in a given file between the working copy and the staged version, or, the working copy and the previously committed version. What we haven’t done yet it look back further into the past.

We’ll start this instalment by picking up where we left off, using git diff, but this time we’ll travel further back in time than simply the staging area or the previous commit.

Looking at changes is very useful, but sometimes you want to see everything as it was at a given time, so we’ll also learn how to export an entire past version as a stand-alone archive.

Finally, we’ll learn how to transport our entire working copy back in time, and why that might be a dangerous thing to do!

Matching Podcast Episode

Listen along to this instalment on episode 659 of the Chit Chat Across the Pond Podcast.

You can also Download the MP3

Instalment Resources

Playing Along

If you’d like to play along with the examples you’ll need to download this instalment’s ZIP file and unzip it. Open a terminal and change into the folder you extracted the ZIP into. You’ll find a file in there named pbs105a.bundle, this is a bundled version of the repository we created in the previous instalment. It’s a single file that contains all our commits from last time. To pick up where we left off, we need to make a new repository and import all the commits from the bundle. We’ll name our new repository pbs105a. To create this new repo we’ll take the following steps:

  1. create a folder named pbs105a
  2. change into that folder
  3. initialise it as a Git repo
  4. if needed, change the default branch from master to main
  5. import all the commits from the bundle into our new repository

The commands to do all this are

mkdir pbs105a
cd pbs105a
git init
[ `git symbolic-ref --short HEAD` = 'master' ] && git checkout -b main
git pull ../pbs105a.bundle

We can now verify that we have imported our full history of commits with git log:

bart-imac2018:pbs105a bart% git --no-pager log
commit ff8bc623afaabb9c2bcde5bc1bc492dc685a2f3e (HEAD -> main)
Author: Bart Busschots <opensource@bartificer.net>
Date:   Sat Nov 7 11:08:37 2020 +0000

    Added my history of programming in Hello World programs

commit 2e3a0ce5bfa15033a7c5cffdced2947fbbb7fe2a
Author: Bart Busschots <opensource@bartificer.net>
Date:   Sat Nov 7 11:08:24 2020 +0000

    Correct semantic markup by moving title and lead into header and adding a footer

commit 80025b1e3e893b73264bc88345486ad409f3a143
Author: Bart Busschots <opensource@bartificer.net>
Date:   Sat Nov 7 11:06:34 2020 +0000

    Added link to Wikipedia article on the Hello World meme

commit a6d4bfda8e86618cd0e95185e1ced17e411467ef
Author: Bart Busschots <opensource@bartificer.net>
Date:   Sat Nov 7 11:06:22 2020 +0000

    Added Bootstrap licensing info to README

commit 7cf6b0b619d4446234e20e9215d5a0ea72b7b9f5
Author: Bart Busschots <opensource@bartificer.net>
Date:   Sat Nov 7 11:04:51 2020 +0000

    Moved from plain HTML to Bootstrap

commit c88546e0b5ddd60e8f0123181a6b6356ca6eee59
Author: Bart Busschots <opensource@bartificer.net>
Date:   Sat Nov 7 11:03:19 2020 +0000

    Fixed silly typo in README

commit d58f072cabeee272d85acaefe7e814cfe5a81896
Author: Bart Busschots <opensource@bartificer.net>
Date:   Sat Nov 7 11:00:27 2020 +0000

    initial version
bart-imac2018:pbs105a bart%

Git Diff

We’ve already used git diff in a previous instalment, but let’s refresh our memories of what we already know about the command before we learn anything new.

A Quick Refresher

As it stands, our sample project contains an HTML page and a README file, and both contain references to PBS 104, we should update those to PBS 105. Additionally, there is a silly typo in the HTML file — I accidentally put a Markdown link into an HTML document by mistake!

Let’s fix both, and let’s use this as an opportunity to refresh our memories from last week and commit both sets of changes as separate commits. You’ll find updated versions of both README.md and index.html in the folder pbs105a-v2 in the instalment ZIP. Copy those two files over the ones in the repository.

Firstly, we can now use git diff with no arguments to see all changes in all files (since I don’t want a pager, I’m using Git’s --no-pager option):

bart-imac2018:pbs105a bart% git --no-pager diff
diff --git a/README.md b/README.md
index 6fc5cdc..de6776c 100644
--- a/README.md
+++ b/README.md
@@ -1,6 +1,6 @@
 # Hello World Mark II

-This is a dummy project for use as an example in instalment 104 of the [Programming by Stealth](https://pbs.bartificer.net/) blog/podcast series.
+This is a dummy project for use as an example in instalment 105 of the [Programming by Stealth](https://pbs.bartificer.net/) blog/podcast series.

 This project contains a single HTML 5 web pages that says hello to the world.

diff --git a/index.html b/index.html
index 6cee467..9bf599c 100644
--- a/index.html
+++ b/index.html
@@ -10,10 +10,10 @@
 <body>
 <header class="container">
 <h1 class="display-1">Hello World!!!</h1>
-<p class="lead">Welcome to some dummy content for instalment 104 of the <a href="https://pbs.bart.ficer.net/" target="_blank" rel="noopener">Programming by Stealth</a> blog/podcast series.</p>
+<p class="lead">Welcome to some dummy content for instalment 105 of the <a href="https://pbs.bart.ficer.net/" target="_blank" rel="noopener">Programming by Stealth</a> blog/podcast series.</p>
 </header>
 <main class="container">
-<p><em>Hello World</em> is a long-standing programming tradition, [learn more on Wikipedia](https://en.wikipedia.org/wiki/%22Hello,_World!%22_program).</p>
+<p><em>Hello World</em> is a long-standing programming tradition, <a href="https://en.wikipedia.org/wiki/%22Hello,_World!%22_program" target="_blank" rel="noopener">learn more on Wikipedia</a>.</p>

 <h2>Bart's Programming History in <em>Hello Worlds</em></h2>

bart-imac2018:pbs105a bart%

You can see there is just one change in README.md, and there are two changes in index.html.

Let’s start by staging just the typo fix, as we learned in the previous instalment, we do that by staging the change with git add and the --patch flag`:

git add --patch index.html

We’ll first be offered all the changes to the file as a single hunk, we want to get more granular, so type s then enter to split the hunk. We’re now offered a hunk representing the change in instalment number, that’s not the change we want to stage ATM, so hit n and enter to decline it. Now we’re offered the typo fix, so hit y and enter to stage it.

Remember that we can see the changes to just a single file by adding that file as the argument to git diff. We can now see the remaining difference between our working copy and the staged version with git diff:

bart-imac2018:pbs105a bart% git diff index.html
diff --git a/index.html b/index.html
index 3c0417f..9bf599c 100644
--- a/index.html
+++ b/index.html
@@ -10,7 +10,7 @@
 <body>
 <header class="container">
 <h1 class="display-1">Hello World!!!</h1>
-<p class="lead">Welcome to some dummy content for instalment 104 of the <a href="https://pbs.bart.ficer.net/" target="_blank" rel="noopener">Programming by Stealth</a> blog/podcast series.</p>
+<p class="lead">Welcome to some dummy content for instalment 105 of the <a href="https://pbs.bart.ficer.net/" target="_blank" rel="noopener">Programming by Stealth</a> blog/podcast series.</p>
 </header>
 <main class="container">
 <p><em>Hello World</em> is a long-standing programming tradition, <a href="https://en.wikipedia.org/wiki/%22Hello,_World!%22_program" target="_blank" rel="noopener">learn more on Wikipedia</a>.</p>
bart-imac2018:pbs105a bart%

And, we can see the difference between the staged copy and the previous commit by adding the --cached flag:

bart-imac2018:pbs105a bart% git diff --cached index.html
diff --git a/index.html b/index.html
index 6cee467..3c0417f 100644
--- a/index.html
+++ b/index.html
@@ -13,7 +13,7 @@
 <p class="lead">Welcome to some dummy content for instalment 104 of the <a href="https://pbs.bart.ficer.net/" target="_blank" rel="noopener">Programming by Stealth</a> blog/podcast series.</p>
 </header>
 <main class="container">
-<p><em>Hello World</em> is a long-standing programming tradition, [learn more on Wikipedia](https://en.wikipedia.org/wiki/%22Hello,_World!%22_program).</p>
+<p><em>Hello World</em> is a long-standing programming tradition, <a href="https://en.wikipedia.org/wiki/%22Hello,_World!%22_program" target="_blank" rel="noopener">learn more on Wikipedia</a>.</p>

 <h2>Bart's Programming History in <em>Hello Worlds</em></h2>

bart-imac2018:pbs105a bart%

These are all the uses of git diff we’ve seen so far.

Before we move on to explore the command further, let’s commit our changes.

First, the typo fix we’ve already staged:

git commit -m 'Fixed Markdown link in HTML file'

Now, let’s stage and commit the instalment number increments:

git add README.md index.html
git commit -m "Updated PBS instalment number to 105 in all files"

Using git diff To Compare Arbitrary Versions of a File

To view differences between what we have now and an arbitrary commit we need a way of specifying a commit. This is where those SHA1 hashes we mentioned at the start of our exploration of Git come in. Each commit has a unique hash, so that is how we refer to it.

Remember, we can see all our commits with git log:

bart-imac2018:pbs105a bart% git --no-pager log
commit 27d4f35937b443a9d660a93e5bd0b4e6d676ee7f (HEAD -> main)
Author: Bart Busschots <opensource@bartificer.net>
Date:   Sat Nov 7 12:18:09 2020 +0000

    Updated PBS instalment number to 105 in all files

commit ff15d677294572f39e1dc3a5d63b51c4f6ffa6b8
Author: Bart Busschots <opensource@bartificer.net>
Date:   Sat Nov 7 12:17:11 2020 +0000

    Fixed Markdown link in HTML file

commit ff8bc623afaabb9c2bcde5bc1bc492dc685a2f3e
Author: Bart Busschots <opensource@bartificer.net>
Date:   Sat Nov 7 11:08:37 2020 +0000

    Added my history of programming in Hello World programs

commit 2e3a0ce5bfa15033a7c5cffdced2947fbbb7fe2a
Author: Bart Busschots <opensource@bartificer.net>
Date:   Sat Nov 7 11:08:24 2020 +0000

    Correct semantic markup by moving title and lead into header and adding a footer

commit 80025b1e3e893b73264bc88345486ad409f3a143
Author: Bart Busschots <opensource@bartificer.net>
Date:   Sat Nov 7 11:06:34 2020 +0000

    Added link to Wikipedia article on the Hello World meme

commit a6d4bfda8e86618cd0e95185e1ced17e411467ef
Author: Bart Busschots <opensource@bartificer.net>
Date:   Sat Nov 7 11:06:22 2020 +0000

    Added Bootstrap licensing info to README

commit 7cf6b0b619d4446234e20e9215d5a0ea72b7b9f5
Author: Bart Busschots <opensource@bartificer.net>
Date:   Sat Nov 7 11:04:51 2020 +0000

    Moved from plain HTML to Bootstrap

commit c88546e0b5ddd60e8f0123181a6b6356ca6eee59
Author: Bart Busschots <opensource@bartificer.net>
Date:   Sat Nov 7 11:03:19 2020 +0000

    Fixed silly typo in README

commit d58f072cabeee272d85acaefe7e814cfe5a81896
Author: Bart Busschots <opensource@bartificer.net>
Date:   Sat Nov 7 11:00:27 2020 +0000

    initial version
bart-imac2018:pbs105a bart%

Comparing the Working Copy to a Previous Commit

To compare the current working copy of a file to itself in a previous commit, pass the commit as the first argument, and the path to the file as the second, e.g., to see all the changes in README.md since the very first commit (hash d58f072cabeee272d85acaefe7e814cfe5a81896) use:

bart-imac2018:pbs105a bart% git --no-pager diff d58f072cabeee272d85acaefe7e814cfe5a81896 README.md
diff --git a/README.md b/README.md
index d267cf7..de6776c 100644
--- a/README.md
+++ b/README.md
@@ -1,5 +1,7 @@
 # Hello World Mark II

-This is a dummy project for use as an example is instalment 104 of the [Programming by Stealth](https://pbs.bartificer.net/) blog/podcast series.
+This is a dummy project for use as an example in instalment 105 of the [Programming by Stealth](https://pbs.bartificer.net/) blog/podcast series.

-This project contains a single HTML 5 web pages that says hello to the world.
\ No newline at end of file
+This project contains a single HTML 5 web pages that says hello to the world.
+
+This project embeds a copy of the open source Bootstrap 4.5 CSS stylesheet created by [Twitter Inc.](https://twitter.com/) and released under the [MIT License](https://github.com/twbs/bootstrap/blob/main/LICENSE).
\ No newline at end of file
bart-imac2018:pbs105a bart%

Comparing Previous Commits

If we want to see how a file changed between two commits in the past we can pas two hashes before the file path. For example, we can see what changed in the HTML file between the second commit and the second from last commit with:

bart-imac2018:pbs105a bart% git --no-pager diff c88546e0b5ddd60e8f0123181a6b6356ca6eee59 ff15d677294572f39e1dc3a5d63b51c4f6ffa6b8 index.html
diff --git a/index.html b/index.html
index 111dff8..3c0417f 100644
--- a/index.html
+++ b/index.html
@@ -3,9 +3,66 @@
 <head>
   <meta charset="utf-8" />
   <title>Hello World!</title>
+
+  <!-- include a local copy of Boostrap 4.5 -->
+  <link rel="stylesheet" href="contrib/Bootstrap4.5/bootstrap.min.css">
 </head>
 <body>
-<h1>Hello World!!!</h1>
-<p>Welcome to some dummy content for instalment 104 of the <a href="https://pbs.bart.ficer.net/" target="_blank" rel="noopener">Programming by Stealth</a> blog/podcast series.</p>
+<header class="container">
+<h1 class="display-1">Hello World!!!</h1>
+<p class="lead">Welcome to some dummy content for instalment 104 of the <a href="https://pbs.bart.ficer.net/" target="_blank" rel="noopener">Programming by Stealth</a> blog/podcast series.</p>
+</header>
+<main class="container">
+<p><em>Hello World</em> is a long-standing programming tradition, <a href="https://en.wikipedia.org/wiki/%22Hello,_World!%22_program" target="_blank" rel="noopener">learn more on Wikipedia</a>.</p>
+
+<h2>Bart's Programming History in <em>Hello Worlds</em></h2>
+
+<pre class="border rounded p-3 bg-success text-light">
+// 1997 - Java
+public class HelloWorld{
+  public static void main(String[] args){
+    System.out.println('Hello World!');
+  }
+}
+
+// 1998 — JavaScript
+document.write("Hello World!&lt;br&gt;");
+
+// 1998 — C++
+#include <iostream>
+using namespace std;
+int main() {
+  cout << "Hello World!\n";
+  return 0;
+}
+
+-- 1998 SQL
+SELECT "Hello World\n";
+
+# 1999 Maple
+"Hello World!";
+
+# 1999 - PHP
+&lt;?php echo 'Hello World!' ?&gt;
+
+# 2000 - Perl
+print "Hello, World!\n";
+
+; 2000 - LISP
+(format t "Hello, World!~%"))
+
+/* 2001 - C */
+#include <stdio.h>
+int main(void){
+  printf("Hello World!\n);
+  return 0;
+}
+
+// 2015 - NodeJS
+console.info('Hello World!');
+</pre>
+
+</main>
+<footer class="container small text-muted">&copy; Bart Busschots 2020 — Released under <a href="https://creativecommons.org/licenses/by-nc-sa/4.0/" target="_blank" rel="noopener">Creative Commons Attribution-NonCommercial-ShareAlike</a>.</footer>
 </body>
 </html>
\ No newline at end of file
bart-imac2018:pbs105a bart%

Exporting the Contents of a Commit with git archive

Comparing differences is one thing, but what if you want to really travel in time? What if you want to get a copy of everything like it was at some point in the past? Maybe you want to run a version of your app from last year, or maybe you want to share a specific version of your book with a friend.

The git archive command is designed to export the contents of a commit without its history. In other words, just the files and folders, nothing else. When you go to GitHub and download the current version of a repo as a ZIP file, you’re using git archive!

As its name implies, git archive will output the contents of the commit as a single archive file. Its default behaviour is to output the information in the ZIP format. Depending on your OS, you may have other formats available. You can see the list of formats available to you by running the command with --list as the only argument:

bart-imac2018:pbs105a bart% git archive --list
tar
tgz
tar.gz
zip
bart-imac2018:pbs105a bart%

To explicitly specify a format use the --format flag followed by one of the listed formats.

Many of the defaults git archive uses make intuitive sense — it defaults to ZIP, and to all files and folders. But one default may catch you out — it defaults to writing the data to standard out (STDOUT), to write to a file instead you have to use the --output flag.

Regardless of the format we use, where we want to send the output, or what we want to include in the archive, we always have to specify where to source the archive from. We can do that by passing a branch name or a commit hash as the first argument.

We haven’t started to create branches yet, so our repository only has its default branch, main. To create an archive of our repository in its current state we can simply run:

git archive --output ../pbs105a.latest.zip main

Note that this command creates the ZIP in our repository’s parent folder because on the terminal, .. means my parent folder.

To get a full copy of our repository after we fixed the bad link, but before we changed the instalment number we specify the commit where we fixed the typo (ff15d677294572f39e1dc3a5d63b51c4f6ffa6b8) rather than the branch name:

git archive --output ../pbs105a.beforeInstalmentBump.zip ff15d677294572f39e1dc3a5d63b51c4f6ffa6b8

If you extract that file and open index.html in a browser you’ll see that the link to Wikipedia is fixed, but the instalment number still says 104.

Finally, if you only want to include some of the files or folders in the specified branch/commit, pass the paths you want as additional arguments. To get just the HTML and README files but nothing else use:

git archive --output ../pbs105a.withoutContrib.zip main index.html README.md

Moving Our Working Copy Through Time

Exporting to a ZIP file is all well and good, but having to extract a ZIP and then go open the files you want to see is a bit of a hassle — wouldn’t it be nice to be able to transport your working copy to a previous point in time?

The good news is that you can do it, and do it easily, but there is a catch — you need to do it really carefully, or you could end up with your repository in a headless state! OK, headless is not the official term, it’s my overly dramatic nickname for what Git refers to as detached HEAD state.

Before we explore the concept of headlessness in Git, let’s lay down some ground rules for how you can safely time-travel your working copy:

  1. Always commit all changes before time travelling!
  2. Until you understand either branching or stashing, do not edit any files while the working copy is in the past!

We’ll be exploring both branching and stashing soon, but for now, only time-travel from a clean working copy, and look, but do not touch, anything in the past!

Finally, the emergency escape hatch is the command:

git checkout --force main

This will discard all changes in the working copy and bring us back to the most recent commit.

Introducing git checkout

The command for loading a specific commit from the index into the working copy is git checkout. This is a very important command that we’ll get to see a lot in the coming instalments. Usually, we’ll be using it to jump between branches, but it can also travel through time!

At its simplest level, git checkout needs just one argument, what commit it should update the working copy from. Commits can be specified directly, by hash, or, the most recent commit on any branch can be referred to by branch name. This is how our escape hatch command works, it specifies that we want to check out the most recent commit on the branch main, and that we insist on doing so even if that means discarding changes (that’s what the --force flag does).

For today, we’ll simply specify commit hashes. Let’s go back to the very start, and see how our project looked at the very beginning by checking out the first commit (which git log shows us has the hash d58f072cabeee272d85acaefe7e814cfe5a81896):

bart-imac2018:pbs105a bart% git checkout d58f072cabeee272d85acaefe7e814cfe5a81896
Note: checking out 'd58f072cabeee272d85acaefe7e814cfe5a81896'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b <new-branch-name>

HEAD is now at d58f072 initial version
bart-imac2018:pbs105a bart%

Notice that Git explicitly tells us we have entered a detached HEAD state. Git will also point this out to us when we do a git status:

bart-imac2018:pbs105a bart% git status
HEAD detached at d58f072
nothing to commit, working tree clean
bart-imac2018:pbs105a bart%

Take a moment to explore the current state of both files.

Now, let’s move forward in time to the point when we switched to Bootstrap:

bart-imac2018:pbs105a bart%  git checkout 7cf6b0b619d4446234e20e9215d5a0ea72b7b9f5
Previous HEAD position was d58f072 initial version
HEAD is now at 7cf6b0b Moved from plain HTML to Bootstrap
bart-imac2018:pbs105a bart%

Again, take a moment to view the HTML file in the browser to see the change.

Now, so we get used to using the emergency escape hatch, let’s intentionally make an edit to one of the files, let’s add some text to the end of the README:

 echo '\nDid I accidentally make an edit? Ooopsie!' >> README.md

We can now see that we have indeed made a change with git status:

bart-imac2018:pbs105a bart% git status
HEAD detached at 7cf6b0b
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

	modified:   README.md

no changes added to commit (use "git add" and/or "git commit -a")
bart-imac2018:pbs105a bart%

If we try to go back to the last commit on the main branch normally Git will stop us:

bart-imac2018:pbs105a bart% git checkout main
error: Your local changes to the following files would be overwritten by checkout:
	README.md
Please commit your changes or stash them before you switch branches.
Aborting
bart-imac2018:pbs105a bart%

But that’s OK, we have an emergency escape hatch:

bart-imac2018:pbs105a bart% git checkout --force main
Previous HEAD position was 7cf6b0b Moved from plain HTML to Bootstrap
Switched to branch 'main'
bart-imac2018:pbs105a bart%

As git status will show us, all is back to normal now, but we have lost our edit:

bart-imac2018:pbs105a bart% git status
On branch main
nothing to commit, working tree clean
bart-imac2018:pbs105a bart%

Detached Heads? (Git’s HEAD Reference Explained)

In a Git repository, the working copy consists of the files and folders from a specific commit and may or may not include some edits, depending on where you are in your workflow. We think of the commit that the working copy is based on as being where we are in the history of our repository. This sense of place also plays an important role when we’re committing our changes — we know the parent commit of our new commit will be the current commit.

Under the hood, Git stores our place in the repository’s history as a special reference named HEAD.

In Git jargon the most recent commit on any branch is referred to as the head of the branch. This commit is important because it is the only commit on the branch to which a new commit can be added. Like new growth appears on the end of a real branch, new commits are added to the head of a Git branch.

When you run git status and it tells you you are On branch main, what it means is that HEAD is pointing at the head of the branch main.

When you check out a commit, you move the HEAD reference to that commit. If that commit is not at the head of a branch, your changes can’t be committed, because you can’t inject commits into the middle of a branch, they can only be added to the end.

In Git’s jargon, when HEAD is not pointed at the head of a branch, and hence, you can’t make any commits, HEAD is detached.

The HEAD reference is Git’s under-the-hood implementation of a sense of place in your repository. Some Git GUIs treat HEAD as an implementation detail, others show it to you. If you do see it on a GUI, you now know what it is.

Final Thoughts

In this instalment we learned to look into the past, but not bring versions of files from the past back to the future, or forward to a new alternative future. We’ll learn how to do the first of those things in the next instalment, and the second in the one after that.

Join the Community

Find us in the PBS channel on the Podfeet Slack.

Podfeet Slack