Managing schema changes

February 01, 2013

Yesterday, I happened upon this video from PGCon 2012 of David Wheeler talking about his schema management tool named Sqitch, and thence also discovered depesz's Versioning scripts.

I have wrestled with schema migrations for many years, so I found David's presentation very interesting. Sqitch (without a "u") has many compelling features. For example, you can "sqitch deploy --untracked" to test a change you haven't committed, then revert to the last committed revision before you edit or commit the change. depesz's scripts are less magical, but offer similar capabilitites.

In particular, one thing is common to both systems: the schema exists in the repository as a number of interdependent changes, each of which must have a name (whether the names are artificial or assigned by the user is immaterial; some kind of tag is required for dependency resolution). To create the whole schema, you have to assemble the pieces in order, and to see the whole schema, you have to look at the database. The database is the canonical representation of the data model.

I prefer to think of my schema as a part of my source code, so I keep a complete version in a text file (or files), presented in the order that I want to explain it, with comments in the right places, and not leave that responsibility to "pg_dump -s".

What difference does it make?

✓ Anyone can look at the source code for the schema and understand it in the "preferred form for modifications".
✗ Making changes means writing an upgrade snippet, and downgrade snippet, and changing the main file.
✓ Creating a new instance of the database always means feeding a small number of files to psql. No need to build a big schema up step by step.
✗ Testing changes becomes harder—the upgrade/downgrade scripts are tested immediately, but in practice the main files is tested only on the next from-scratch deployment.
✓ There is no need for dependency management or complex ordering between changes. Deploying needs no cleverness, only psql. Maybe a little shell script.
✗ The natural way to represent a series of changes is with the numbered files that David so hates, and every number is an incipient merge conflict.

How these points stack up against each other depends on the situation. For example, a single web service may care less about deploying from scratch than an installable package. If the schema changes frequently, the testing overhead may outweigh other considerations. A project with one or two developers may not have to worry about numbering conflicts, and so on. Being able to read through the schema ranks highly for me.

I'll write about our approach to schema management in Archiveopteryx later.