Re: Backup using GiT?

From: "Ciprian Dorin Craciun" <ciprian(dot)craciun(at)gmail(dot)com>
To: "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "James B(dot) Byrne" <byrnejb(at)harte-lyne(dot)ca>, pgsql-general(at)postgresql(dot)org
Subject: Re: Backup using GiT?
Date: 2008-06-14 10:50:50
Message-ID: 8e04b5820806140350p36437297p649ca238ab198ca0@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, Jun 13, 2008 at 11:11 PM, Alvaro Herrera
<alvherre(at)commandprompt(dot)com> wrote:
> Tom Lane wrote:
>> "James B. Byrne" <byrnejb(at)harte-lyne(dot)ca> writes:
>
>> > GiT works by compressing deltas of the contents of successive versions of file
>> > systems under repository control. It treats binary objects as just another
>> > object under control. The question is, are successive (compressed) dumps of
>> > an altered database sufficiently similar to make the deltas small enough to
>> > warrant this approach?
>>
>> No. If you compress it, you can be pretty certain that the output will
>> be different from the first point of difference to the end of the file.
>> You'd have to work on uncompressed output, which might cost more than
>> you'd end up saving ...
>
> The other problem is that since the tables are not dumped in any
> consistent order, it's pretty unlikely that you'd get any similarity
> between two dumps of the same table. To get any benefit, you'd need to
> get pg_dump to dump sorted tuples.
>
> --
> Alvaro Herrera http://www.CommandPrompt.com/
> The PostgreSQL Company - Command Prompt, Inc.
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

The idea of using GIT for backing-up databases is not that bad.

I would propose the following:
-- dump the creation script in a separate file; (or maybe one file
per object (table, view, function) etc.;)
-- dump the content of each table in it's own file;
-- dump the tuples sorted but in plain text (as COPY data or
INSERTS maybe); (as Alvaro suggested);
-- don't use compression (as Tom and Chander suggested) because
GIT already uses compression for the packed files;

One advantage of using GIT in the manner described previously will
be change tracking by doing just a simple git diff you could see the
modifications (inserts, updates, deletes, etc., schema alteration).
Going a step further you could also do merges between multiple
databases with the same structure (each database would have it's own
branch).

Just imagine how simple a database schema upgrade will be in most
situations, when both the development and the deployed schema have
been modified and we want to put them into sync.

As a conclusion I would subscribe to such an idea.

Ciprian Craciun.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Joris Dobbelsteen 2008-06-14 12:44:31 Re: Nested IMMUTABLE functions
Previous Message Peter Billen 2008-06-14 09:50:39 XML output & multiple SELECT queries