From: | Merlin Moncure <mmoncure(at)gmail(dot)com> |
---|---|
To: | Joel Jacobson <joel(at)trustly(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Schema version management |
Date: | 2012-07-10 22:24:01 |
Message-ID: | CAHyXU0yUEQtJ3=JOzkBonFwNw1VOy4ej+xJiFopEJ2NDr0DviA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, May 20, 2012 at 2:41 PM, Joel Jacobson <joel(at)trustly(dot)com> wrote:
> Hi,
>
> I just read a very interesting post about "schema version management".
>
> Quote: "You could set it up so that every developer gets their own
> test database, sets up the schema there, takes a dump, and checks that
> in. There are going to be problems with that, including that dumps
> produced by pg_dump are ugly and optimized for restoring, not for
> developing with, and they don't have a deterministic output order." (
> http://petereisentraut.blogspot.com/2012/05/my-anti-take-on-database-schema-version.html
> )
>
> Back in December 2010, I suggested a new option to pg_dump, --split,
> which would write the schema definition of each object in separate
> files:
>
> http://archives.postgresql.org/pgsql-hackers/2010-12/msg02285.php
>
> Instead of a huge plain text schema file, impossible to version
> control, all tables/sequences/views/functions are written to separate
> files, allowing the use of a version control software system, such as
> git, to do proper version controlling.
>
> The "deterministic output order" problem mentioned in the post above,
> is not a problem if each object (table/sequence/view/function/etc) is
> written to the same filename everytime.
> No matter the order, the tree of files and their content will be
> identical, no matter the order in which they are dumped.
>
> I remember a lot of hackers were very positive about this option, but
> we somehow failed to agree on the naming of files in the tree
> structure. I'm sure we can work that out though.
>
> I use this feature in production, I have a cronjob which does a dump
> of the schema every hour, committing any eventual changes to a
> separate git branch for each database installation, such as
> production, development and test.
> If no changes to the schema have been made, nothing will be committed
> to git since none of the files have changed.
>
> It is then drop-dead simple to diff two different branches of the
> database schema, such as development or production, or diffing
> different revisions allowing point-in-time comparison of the schema.
>
> This is an example of the otuput of a git log --summary for one of the
> automatic commits to our production database's git-repo:
>
> --
> commit 18c31f8162d851b0dac3bad7e80529ef2ed18be3
> Author: Production Database <production(dot)database(at)trustly(dot)com>
> Date: Fri May 4 15:00:04 2012 +0200
>
> Update of database schema Linux DB0 2.6.26-2-amd64 #1 SMP Wed Aug
> 19 22:33:18 UTC 2009 x86_64 GNU/Linux Fri, 04 May 2012 15:00:04 +0200
>
> create mode 100644
> gluepay-split/public/CONSTRAINT/openingclosingbalances_pkey.sql
> create mode 100644
> gluepay-split/public/CONSTRAINT/openingclosingbalances_source_key.sql
> create mode 100644 gluepay-split/public/SEQUENCE/seqopeningclosingbalance.sql
> create mode 100644 gluepay-split/public/TABLE/openingclosingbalances.sql
> --
>
> Here we can see we apparently deployed a new table,
> "openingclosingbalances" around Fri May 4 15:00:04.
>
> Without any manual work, I'm able to follow all changes actually
> _deployed_ in each database.
>
> At my company, a highly database-centric stored-procedure intensive
> business dealing with mission-critical monetary transactions, we've
> been using this technique to successfully do schema version management
> without any hassle for the last two years.
>
> Hopefully this can add to the list of various possible _useful_ schema
> version management methods.
What does your patch do that you can't already do with pg_restore?
create function foo(a int, b int, c text) returns int as $$ select 0;
$$ language sql;
CREATE FUNCTION
pg_dump -Fc postgres -s > postgres.dump
pg_restore -l postgres.dump | grep FUNCTION
196; 1255 32939 FUNCTION public foo(integer, integer, text) merlin
pg_restore -P "foo(integer, integer, text)" postgres.dump
<function body follows>
it's fairly easy to wrap pg_restore with a smalls script that extracts
function bodies and writes them out to file names. this is a great
and underused feature, so I'd argue that if you wanted to formalize
per object file extraction you should be looking at expanding
pg_restore, not pg_dump.
merlin
From | Date | Subject | |
---|---|---|---|
Next Message | Joe Conway | 2012-07-10 22:28:31 | Re: has_language_privilege returns incorrect answer for non-superuser |
Previous Message | Magnus Hagander | 2012-07-10 21:55:09 | Re: Schema version management |