Yesterday, I happened upon
this video from
PGCon 2012
of David Wheeler talking about his schema management tool named
Sqitch,
and thence also discovered
depesz's Versioning
scripts.
I have wrestled with schema migrations for many years, so I found
David's presentation very interesting. Sqitch (without a "u") has many
compelling features. For example, you can "sqitch deploy --untracked" to
test a change you haven't committed, then revert to the last committed
revision before you edit or commit the change. depesz's scripts are less
magical, but offer similar capabilitites.
In particular, one thing is common to both systems: the schema exists in
the repository as a number of interdependent changes, each of which must
have a name (whether the names are artificial or assigned by the user is
immaterial; some kind of tag is required for dependency resolution). To
create the whole schema, you have to assemble the pieces in order, and
to see the whole schema, you have to look at the database. The database
is the canonical representation of the data model.
I prefer to think of my schema as a part of my source code, so I keep a
complete version in a text file (or files), presented in the order that
I want to explain it, with comments in the right places, and not leave
that responsibility to "pg_dump -s".
What difference does it make?
- ✓ Anyone can look at the source code for the schema and understand
it in the "preferred form for modifications".
- ✗ Making changes means writing an upgrade snippet, and downgrade
snippet, and changing the main file.
- ✓ Creating a new instance of the database always means feeding a
small number of files to psql. No need to build a big schema up step
by step.
- ✗ Testing changes becomes harder—the upgrade/downgrade scripts are
tested immediately, but in practice the main files is tested only on
the next from-scratch deployment.
- ✓ There is no need for dependency management or complex ordering
between changes. Deploying needs no cleverness, only psql. Maybe a
little shell script.
- ✗ The natural way to represent a series of changes is with the
numbered files that David so hates, and every number is an incipient
merge conflict.
How these points stack up against each other depends on the situation.
For example, a single web service may care less about deploying from
scratch than an installable package. If the schema changes frequently,
the testing overhead may outweigh other considerations. A project with
one or two developers may not have to worry about numbering conflicts,
and so on. Being able to read through the schema ranks highly for me.
I'll write about our approach to schema management in
Archiveopteryx
later.