Managing schema changes

By Abhijit Menon-Sen <ams@toroid.org>

2013-02-01

Yesterday, I happened upon this video from PGCon 2012 of David Wheeler talking about his schema management tool named Sqitch, and thence also discovered depesz's Versioning scripts.

I have wrestled with schema migrations for many years, so I found David's presentation very interesting. Sqitch (without a "u") has many compelling features. For example, you can "sqitch deploy --untracked" to test a change you haven't committed, then revert to the last committed revision before you edit or commit the change. depesz's scripts are less magical, but offer similar capabilitites.

In particular, one thing is common to both systems: the schema exists in the repository as a number of interdependent changes, each of which must have a name (whether the names are artificial or assigned by the user is immaterial; some kind of tag is required for dependency resolution). To create the whole schema, you have to assemble the pieces in order, and to see the whole schema, you have to look at the database. The database is the canonical representation of the data model.

I prefer to think of my schema as a part of my source code, so I keep a complete version in a text file (or files), presented in the order that I want to explain it, with comments in the right places, and not leave that responsibility to "pg_dump -s".

What difference does it make?

How these points stack up against each other depends on the situation. For example, a single web service may care less about deploying from scratch than an installable package. If the schema changes frequently, the testing overhead may outweigh other considerations. A project with one or two developers may not have to worry about numbering conflicts, and so on. Being able to read through the schema ranks highly for me.

I'll write about our approach to schema management in Archiveopteryx later.