PBS 101 of X: Introducing Version Control (Git)

12 Sep 2020

In this first instalment of our second century we’re starting a short pivot away from programming itself, and into one of the most important tools in a programmer’s toolkit, version control. Version control isn’t about version numbering conventions, alphas, betas, or anything like that, it’s about capturing the full history of your software’s existence so you can safely experiment and easily change your mind.

Programming is all about making choices — do I go with one approach or the other? Before you start writing your actual code one of the two options might seem simple, but once you start you often change your mind. Without version control backing out of a decision is messy and error prone, and changing your mind a second time is soul-destroying. With version control you simply label the decision point, and start coding, if it works out, great, if not, you label your dead end, revert your code to how it was at the decision point, and start down the other path. If that proves even worse than your first path, you can simply label that new dead end, and jump back to your first path and continue where you left off.

Version control is extremely powerful, but with that power comes complexity. There are a lot of concepts to internalise, and because the tools are so flexible, an infinity of possible approaches you can take (and nerds can argue over). It takes time to find your perfect version control strategy, and the only way to get there is to get stuck in, to make mistakes, and to learn from them. It’s OK to change your mind. It’s OK to do things differently to your opinionated geek friend. Do what works for you, and just keep tweaking until you’re happy.

We’re going to focus on using version control to manage our code, but it can be used to manage anything. The Taming the Terminal book is managed through version control, as indeed are these very show notes you’re reading now! If you can create it on a computer you can version it if you choose!

Matching Podcast Episode

Listen along to this instalment on episode 652 of the Chit Chat Across the Pond Podcast.

You can also Download the MP3

Two Totally Different Universes of Version Control

There is a massive dividing line through the middle of the universe of version control systems — there is the client-server model and the peer-to-peer, or distributed, model. The client-server model has the advantage of intuitive simplicity, but it’s fallen out of favour for many good reasons, and peer-to-peer is very much in the ascendancy these days.

The Client-Server Approach

Presumably because it’s so intuitive, this is how version control got started.

The model is built around the concept of a single central server holding the canonical truth about the versioned content. Creators can check out a given version of the content, make changes, and then submit those back to the server.

Whether you have one contributor or a million, there is never any doubt about what the official state of the code is, and, when two developers make conflicting changes, the rules for who has to re-do their work are trivial — first-come-first-served! If the server gets your version before Alice’s, your commit will be accepted, Alice’s will be rejected, and she’ll have to figure out how to re-do her changes so they’re compatible with yours.

For large institutions the need to run a server is not problematic, and the centralised control seems darn appealing, but things look very different outside of such institutions. Open Source communities are all about equal collaboration, rather than centralised management. Sure, one of the nerds can probably host the server, but who should be trusted to become that single point of control and failure? It’s not a collaboration of equals if everything depends on one contributor hosting and managing a single server that holds the only canonical truth!

As for home users — needing to run a server just to keep track of your versions doesn’t sound at all appealing!

Then you have the fact that you need access to the server to work effectively. You can’t create a restore point without server access, so, you need network access each time you make a decision you might want to back out of. That’s not a problem if you’re a code monkey in a cubicle farm, but what if you want to make the most of some quiet time in a log cabin to really get some work done, or, as hard as it is to imagine in 2020, on a 12 hour direct flight from Dublin to LAX? (Been there, done that, got a lot of good coding done 🙂)

This is where both proprietary and open source version control started. In the open-source world the first client-server version control system to get traction was CVS in 1990 (the Concurrent Versioning System), followed by the pinnacle of client-server open source version control, SVN (Subversion) first released a decade later in 2000.

Personally, I had the misfortune of needing to use CVS briefly before being allowed to switch to SVN. I actually really liked SVN, and since I was the kind of nerd who ran my own server, I stuck with it a lot longer than most, continuing to use it well into the twenty teens.

The Peer-to-Peer AKA Distributed Model

When you check out code from a version control server you only get a snapshot of the project’s history. Only the central server knows all. With a peer-to-peer model, every copy has the full history, so everyone knows everything.

In this distributed universe, there is no server. If there are three collaborators, all three have complete version histories, and all three are equal. They can offer updates to each other, but there is no onus on anyone to accept changes from anyone else. If everyone cooperates, everyone’s versions stay in sync, if anyone gets fed up and leaves, their version diverges from the others.

Distributed version control systems don’t enforce such utter chaos, an organisation can choose to host a copy that is considered canonical, but that’s a human decision, and is in no way technically enforced. The agreed upon canonical copy is technologically no different to every other copy. What each developer has on their laptops is as complete as the officially blessed copy. Every contributor has everything, so they are free to go their own way when ever they wish.

Because every copy has everything, you don’t need a server at all if you’re a home user and you just want to code.

But, even if you do want to share. the fact that every copy has everything is a real game-changer. For a start, you don’t need to be online to work! You can version away to your heart’s content as you whizz across the Atlantic ocean in a cleverly-shaped metal tube, and then share all your work some time later, whenever it suits you. You can work away for a week or a month in your log cabin, and then share your work when you get home, or once a week when you go to down to get groceries and a sip of internet.

It’s obvious why home users and the open source world embraced peer-to-peer version control, but why has it also taken off in the corporate world?

Well, when you combine corporate policies and a good VPN with a distributed model you can still exercise central control by managing the official master copy, but you can also facilitate off-line working by remote and traveling employees. When they clone the official repository they can work offline for as long as they need to, and when they have internet access they can VPN to the office and synchronise their full copy with the master copy.

While there were distributed open source version control systems before the mid-2000s, that seems to be when the idea really took hold. In 2005 alone three potential heirs to SVN’s crown were released, GNU Bazaar, Mercurial, and Git. The GNU offering never quite got traction, but Mercurial really started to gather steam. The hands-down winner though is undoubtedly Git, because it was built specifically to manage the Linux kernel. Can there be a more emblematic open source project than that?

To say Git hit the ground running is an understatement. Linux Torvalds himself started the code on April 3 2005, announced it to the public on April 6th, and the Git project became self-hosted (i.e. Git was versioned in Git) from the very next day, April 7th. Four days to go from idea to functional enough to version its own code!

The next big milestone for Git was the release of the Linux 2.6.12 Kernel on June 6th, the first version of the kernel to be versioned using Git. Torvalds handed the project to Junio Hamano in July 2005, and Hamano has been the chief maintainer ever since.

Because Git proved itself capable of managing the most iconic open source project of all, it soon went viral, and it now absolutely dominates the open source world.

As you’ve probably guessed by now, we’ll be learning the concepts of distributed version control using Git.

Our Git Journey

So, like we used JavaScript to learn the concepts of programming, we’re going to use Git to learn the concepts of distributed version control. We’re going to start from zero, and assume you have no previous versioning control experience. But where is our journey headed? My hope to is get to a place where you can manage your own code on your own computers, synchronise it to a remote copy for safe-keeping, and, contribute to open source projects hosted on Git. Since this series is open source and hosted on Git, I’d ideally like you all to be able to fix my typos for me in the future 😉. Rather than emailing me to tell me about the typo, you’d be able to get a copy of the Git repository, make the changes, and then submit your changes to me for incorporation into the official copy of the series.

Phase 1 — Local Repositories

While we learn core version control and Git principles, we’ll keep things really simple — we’ll work with local repositories that are not shared in any way. All you’ll need is a copy of the Git command line tools and you’re good to go. You can get Git for Windows, Linux, or Mac, so everyone can play along!

We’ll be starting our learning journey on the command line so we truly understand what we’re doing. That’s not how Git gets used in the real world, so we’ll also be using some free and/or open source Git GUIs, and hopefully you’ll see how the under-the-hood Git commands map to Git GUIs. There are lots of great Git GUIs out there, and developers tend to have very strong opinions about them, so I won’t be recommending a specific GUI.

Phase 2 — A Server Just for Us

Once we’re comfortable managing our code in a local repository, we’ll throw in a little complexity by adding in server-hosted copies for our own use rather than for sharing.

There are two big reasons to have private server-hosted Git repositories. Firstly, they act as a backup should anything happen to your computer, and secondly, they allow you to easily work on your code from multiple different computers.

Phase 3 — Collaborating with Others

Once we’re comfortable maintaining server-hosted copies of our own repositories, we’ll be ready to step out into the big bad world of publicly shared repositories. We’ll learn how to get our own copy of someone else’s repository, make changes, either just for our own use, or for sharing with the original author(s).

Which Git Server?

Since we’ll be using Git servers from phase two on, the obvious question is which one? Since Git itself is open source you can host your own remote Git repository using nothing more than a Linux VM, SSH or a web server and Git installed. The Git protocol can work over both SSH or HTTP, so if you’re the kind of person who runs servers, hosting a remote Git repository is trivially easy.

However, not everyone is the kind of person who enjoys hosting their own Linux server, and this series is really not intended to teach Linux sysadmin skills, so we’ll out-source our server needs 🙂.

When it comes to Git-as-a-service two offerings stand head-and-shoulders above the rest — GitHub and GitLab.

GitHub started life as a web-based hosted Git solution. You can create repositories on their servers and access them over SSH or HTTPS as if you were running your own server, but GitHub took things up a notch by also adding a powerful web interface. In effect, GitHub is to Git what Gmail is to email — both a server and powerful web-based UI.

GitHub was originally independent, but they are now owned by Microsoft. They offer very generous free hosting of both public and private repositories, and the option to upgrade to paid tiers for additional more advanced features. While GitHub has always been a commercial product, the company has a long history of supporting open source software. For a long time their business model was to offer open source projects all their features for free, and to charge anyone who wanted a private repository. As the feature-set grew their offerings became a little more complex, and you can now host private repositories for free. However, one feature has remained constant — when you choose to make your repository public, you get more free stuff than when you choose to keep things private.

GitHub’s original business model was a simple freemium model, they never ever relied on selling ads or data. In other words, they are not, and never were, FreePI.

When Microsoft took over the company they tweaked the business model a little. Microsoft see GitHub as part of their contribution to the open source community rather than a profit centre, so they run it to break even, and have actually added extra features to the free tier and lowered their prices on the paid tiers.

GitLab started as a free and open source software package designed to let you host a private GitHub on your own server. The open source GitLab software lets you get all the usability of GitHub, but on the privacy of your own server. This works particularly well in the corporate world where GitLab servers can keep a company’s code safe and under control without depriving their developers of a nice user experience.

Like Wordpress, GitLab expanded to become both an open source software project and a hosted services provider. Like you can run your own Wordpress blog or use one hosted for you by Wordpress on their servers, you can run your own GitLab server, or use their hosted offering at gitlab.com.

Like GitHub, GitLab’s business model has always been based on selling services. Their focus has been more on the corporate market, so their free tier has always been more limited than GitHub’s. Their business model is clearly a freemium model, and they too are not and never have been FreePI. More recently GitLab have doubled down on their enterprise focus by splitting the software into two distinct offerings, the entirely open source GitLab CE (Community Edition), and GitLab EE (Enterprise Edition), an enhanced offering which extends the community edition with closed-source additional features targeted at large organisations.

Because of their long history of offering powerful free services to open source projects, because of Microsoft’s continuing pro-open-source management of the service, and because their free tier is so feature-rich, we’ll be using GitHub in this series. You won’t need to use GitHub to play along, you can run your own server or use an alternative hosted service like GitLab, but if you do, you’ll need to tweak the examples a little.

Final Thoughts

The aim of this instalment was to describe the proverbial forest before we focus in on a few specific trees.

Version control comes in two distinct flavours, client-server, and peer-to-peer, usually described as distributed. We’ll be going the distributed route.

There are lots of distributed version control systems to choose from, both open source and proprietary, but we’ll be focusing on by far the most popular of them all, the open source Git system.

Having chosen Git, we’ll learn new concepts using the official Git command line tools first, but then see how they are exposed in commonly used Git GUIs. Because there are so many Git GUIs, we won’t be recommending any specific one, though you’re likely to see quite a few screenshots of my personal favourite, the freemium cross-platform GitKraken.

When it comes to Git servers you’ll be able to use any server you want, but we’ll be using free GitHub accounts for our examples.

Finally, and most importantly, the aim of the next few instalments is to teach you how to manage your source code so you can change your mind repeatedly without losing any code, so you can easily share your code between your computers, and so you can back it up to a server. Finally, we’ll learn to share your code with others, use code shared by others, and even contribute improvements to that shared code.

No Previous (on First) Git Miniseries Introduction

100. Time Sharing Challenge Solution Programming by Stealth 102. Introduction

Join the Community

Find us in the PBS channel on the Podfeet Slack.

Podfeet Slack