Quick Links

Schema version management

From:	Joel Jacobson <joel(at)trustly(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Schema version management
Date:	2012-05-20 19:41:56
Message-ID:	CAASwCXdKROy2UHLof4Zvxx00goSccqXkg_g4Yf9vOuK8aXWh0g@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

I just read a very interesting post about "schema version management".

Quote: "You could set it up so that every developer gets their own
test database, sets up the schema there, takes a dump, and checks that
in. There are going to be problems with that, including that dumps
produced by pg_dump are ugly and optimized for restoring, not for
developing with, and they don't have a deterministic output order." (
http://petereisentraut.blogspot.com/2012/05/my-anti-take-on-database-schema-version.html
)

Back in December 2010, I suggested a new option to pg_dump, --split,
which would write the schema definition of each object in separate
files:

http://archives.postgresql.org/pgsql-hackers/2010-12/msg02285.php

Instead of a huge plain text schema file, impossible to version
control, all tables/sequences/views/functions are written to separate
files, allowing the use of a version control software system, such as
git, to do proper version controlling.

The "deterministic output order" problem mentioned in the post above,
is not a problem if each object (table/sequence/view/function/etc) is
written to the same filename everytime.
No matter the order, the tree of files and their content will be
identical, no matter the order in which they are dumped.

I remember a lot of hackers were very positive about this option, but
we somehow failed to agree on the naming of files in the tree
structure. I'm sure we can work that out though.

I use this feature in production, I have a cronjob which does a dump
of the schema every hour, committing any eventual changes to a
separate git branch for each database installation, such as
production, development and test.
If no changes to the schema have been made, nothing will be committed
to git since none of the files have changed.

It is then drop-dead simple to diff two different branches of the
database schema, such as development or production, or diffing
different revisions allowing point-in-time comparison of the schema.

This is an example of the otuput of a git log --summary for one of the
automatic commits to our production database's git-repo:

--
commit 18c31f8162d851b0dac3bad7e80529ef2ed18be3
Author: Production Database <production(dot)database(at)trustly(dot)com>
Date: Fri May 4 15:00:04 2012 +0200

Update of database schema Linux DB0 2.6.26-2-amd64 #1 SMP Wed Aug
19 22:33:18 UTC 2009 x86_64 GNU/Linux Fri, 04 May 2012 15:00:04 +0200

create mode 100644
gluepay-split/public/CONSTRAINT/openingclosingbalances_pkey.sql
create mode 100644
gluepay-split/public/CONSTRAINT/openingclosingbalances_source_key.sql
create mode 100644 gluepay-split/public/SEQUENCE/seqopeningclosingbalance.sql
create mode 100644 gluepay-split/public/TABLE/openingclosingbalances.sql
--

Here we can see we apparently deployed a new table,
"openingclosingbalances" around Fri May 4 15:00:04.

Without any manual work, I'm able to follow all changes actually
_deployed_ in each database.

At my company, a highly database-centric stored-procedure intensive
business dealing with mission-critical monetary transactions, we've
been using this technique to successfully do schema version management
without any hassle for the last two years.

Hopefully this can add to the list of various possible _useful_ schema
version management methods.

Best regards,

Joel Jacobson

Responses

Re: Schema version management at 2012-05-21 01:08:18 from Daniel Farina
Re: Schema version management at 2012-07-10 22:24:01 from Merlin Moncure

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Robert Haas	2012-05-20 23:57:06	Re: Why is indexonlyscan so darned slow?
Previous Message	David Fetter	2012-05-20 19:30:52	Re: Remove readline notice from psql --version?