The Advisory Boar

By Abhijit Menon-Sen <>

"Using Git to manage a web site" reprinted in Hacker Monthly

Some months ago, someone posted a link to my article on using Git to manage a web site to the Hacker News web site, where it was briefly quite popular.

In March, the founder of Hacker Monthly magazine ("The Best of Hacker News, in Print") contacted me to ask for permission to reprint my article in the next issue. I got a print copy and a year's digital subscription. I had only heard of the magazine in passing before, but it looks nice.

Hacker Monthly issue #11 is out now, and here's a PDF of my article.

Git: post-receive hook for XMPP notifications

I wrote long ago about the trouble I had with Net::XMPP while setting up a notification hook for Archiveopteryx, but I didn't think anyone would find the script itself particularly interesting. But people have asked me about it, so here it is.

Read more…

Managing release branches: git merge vs. p4 integrate

When the Archiveopteryx source code lived in Perforce, we would submit everything to the src/main branch, review and test, then use p4 integrate to merge selected changes into release branches like src/rel/2.0. The only changes we submitted directly to the latter branches were release-specific, like setting the version number in Jamsettings. We could safely re-run p4 integrate at any time, and it would show us only those changes that we had not already reviewed.

When we moved to git, we continued to work this way—development happened in master, and we would use git cherry-pick to integrate or backport selected changes into older release branches. New release branches were created by branching from the current master, and maintained the same way. We did this for almost two years and several releases, but it was not much fun.

There was no easy way to answer the question Which commits do I need to consider for inclusion? for any given release branch. In theory, git log --cherry-pick will tell you, but it doesn't work very well. We used to do monthly releases, but it was so painful to deal with the build-up of commits at release time that we were forced to backport changes in smaller batches throughout the month (but that was not, in itself, a bad thing).

Read more…

Using Git with a central repository

This tutorial explains how to share a Git repository among developers. It is meant for small teams who are adopting Git for the first time, and want to get started quickly with a familiar setup before exploring Git's many new possibilities.

If you follow this route, you will end up with a single centrally-hosted repository that everyone in your group can use to publish their own work and fetch whatever others have published. People used to a centralised VCS will find this model easy to adjust to, but of course, each user's "working copy" will itself be a fully-fledged Git repository, and many new workflows are available to users as they learn more.

It would help if you're familiar with basic Git terminology and usage, but if not, you can skim through to find out which commands you need to read about and experiment with. (I recommend Git from the bottom up and the Git tutorial for an introduction.) I shall assume that everyone has git 1.6.5 or later installed, and that they have ssh access to the server that will host the repository.

Read more…

Git disaster recovery

I typed git commit and git push, and a few seconds later, the mains power died. Normally, I wouldn't have noticed, but my trusty UPS is broken, so for the first time in many years, every power glitch makes its presence felt; and now, I can fully experience the joy of being bitten in the rear by Ext4's delayed allocation.

When my machine came up again, the newly-created commit object and some associated tree objects were corrupted. refs/heads/master pointed to that corrupted commit, so most git commands died with this error message:

$ git log
fatal: object 54590b644cb542d30ec962c138a763dddc26aac0 is corrupted

To my great good fortune, my git push had completed before the power failed, so I knew I could recover everything from the remote repository. I flailed around a little before finding out how, but here's what ultimately worked for me.

First, I kept running git fsck and deleting the objects it complained about:

$ git fsck
fatal: object 54590b644cb542d30ec962c138a763dddc26aac0 is corrupted
$ rm -f .git/objects/54/590b644cb542d30ec962c138a763dddc26aac0

Then I copied the corrupted objects back from the remote repository one by one, using a trick Sam Vilain showed me on IRC:

$ ssh \
    "git cat-file commit 54590b644cb542d30ec962c138a763dddc26aac0" | \
    git hash-object -w -t commit --stdin

If I had deleted the corrupted objects and reset my HEAD to point to an older commit, a plain old git fetch should have retrieved the missing objects. I didn't think of that soon enough, and recovered the missing commit first, so git fetch thought everything was up to date. But fetching the objects one by one worked fine, and git fsck stopped complaining.

I'm not sure what I would have been able to do if the remote repository had not been updated in time. I would almost certainly have lost the most recent commit, and perhaps also its immediate parent.

I really hope my UPS gets fixed soon.

Perforce did not suck

I've noticed that a lot of people in the open source world have a negative opinion of Perforce, whether they've used it or not. Here is one recent example:

There's also Perforce, which I don't know much about, but I gather it's a crappy proprietary centralised VCS which is worse than Subversion in pretty much every way.

This kind of offhand dismissal by people who are not familiar with Perforce is very common. When we were switching from Perforce to git for the Perl 5 source code, a lot of people assumed we wanted to do it because Perforce wasn't good enough (but it was really because the open source licensing procedure was non-trivial, and the lack of anonymous repository access was seen as inhibiting contributors; there were also objections to depending on a free-but-not-Free program).

There are other people who have used Perforce and not liked something about it. Their opinions range from reasoned critiques to poisonous rants:

[Dear Perforce… ] Fuck you, you miserable, untrustworthy, misleading, overpriced bastard. I hope your office goes up in flames along with all your off-site backups. I pray that some open source product that actually works is embraced by all the major companies and drives you out of business. I hope that no other company is duped by your salespeople into thinking you have something even remotely close in quality to the ancient and craptastic product known as CVS. Never before have I experienced so much pain in the most simplistic of version control tasks as I have since starting to work at a company that made the mistake of considering you.

I used Perforce exclusively for many years, both for large projects with many other users and small personal projects, and my experience with it was very different. I loved Perforce. I found it refreshingly simple to learn, it worked fast and unsurprisingly and well, and it had excellent support and documentation (of the kind that few open source programs of any kind have, even now). I encountered only two or three minor bugs in it after several years of use, and I never once had to fix the repository (a welcome change from CVS).

There are, of course, many valid criticisms of Perforce, and my intention is not to defend it against those. I've suffered from some of its problems myself: its (mostly justifiable) dependence on the network was at odds with my very slow dialup link, p4p (the proxy) didn't work very well for me, some administrators I know had problems configuring their server the way they wanted, and so on. I switched to git myself a few years ago, and later helped other projects (Perl, Archiveopteryx) I cared about to move away from Perforce too. I haven't regretted the change.

But Perforce certainly did not suck, and there are some things I still miss about it. As non-distributed VCSes go, I think Perforce is vastly better than the (many) other programs I've used.

Updating last-modified dates with a git hook

I wrote a git post-commit hook that looks at certain files in my repository whenever I change them, edits them a bit if it wants to, and commits any changes it made. The changes it makes are not very interesting, but such a hook could, for example, be used to maintain "Last modified: ..." lines in static HTML files as shown below.

Let's say we want to update all foo/*.html files that contain something like the following line:

<div class=lastmod>Last modified: ...</div>

The idea is simple: use git diff --name-only HEAD^ HEAD to get a list of the files that were changed by the last commit, pick the ones we're interested in, edit them using sed, and commit any changes we make.

Read more…

Using Git to manage a web site

The HTML source for my (i.e., this) web site lives in a Git repository on my local workstation. This page describes how I set things up so that I can make changes live by running just "git push web".

The one-line summary: push into a remote repository that has a detached work tree, and a post-receive hook that runs "git checkout -f".

Read more…