From: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
---|---|
To: | Eduardo Morras <emorrasg(at)yahoo(dot)es> |
Cc: | "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>, "sthomas(at)optionshouse(dot)com" <sthomas(at)optionshouse(dot)com> |
Subject: | Backup advice |
Date: | 2013-04-16 02:54:15 |
Message-ID: | CAMkU=1z0+=M-2g-N3+y=d-QJu-qcYTGXbw_h1E5g5Gr+rCny6A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Tue, Apr 9, 2013 at 3:05 AM, Eduardo Morras
<emorrasg(at)yahoo(dot)es<javascript:_e({}, 'cvml', 'emorrasg(at)yahoo(dot)es');>
> wrote:
> On Mon, 8 Apr 2013 10:40:16 -0500
> Shaun Thomas <sthomas(at)optionshouse(dot)com <javascript:_e({}, 'cvml',
> 'sthomas(at)optionshouse(dot)com');>> wrote:
>
> >
> > Anyone else?
> >
>
> If his db has low inserts/updates/deletes he can use diff between pg_dumps
> (with default -Fp) before compressing.
>
Most "diff" implementations will read the entirety of both files into
memory, so may not work well with 200GB of data, unless it is broken into a
large number of much smaller files.
open-vcdiff only reads one of the files into memory, but I couldn't really
figure out what happens memory-wise when you try to undo the resulting
patch, the documentation is a bit mysterious.
xdelta3 will "work" on streamed files of unlimited size, but it doesn't
work very well unless the files fit in memory, or have the analogous data
in the same order between the two files.
A while ago I did some attempts to "co-compress" dump files, based on the
notion that the pg_dump text format does not have \n within records so it
is sortable as ordinary text, and that usually tables have their "stable"
columns, like a pk, near the beginning of the table and volatile columns
near the end, so that sorting the lines of several dump files together will
gather replicate or near-replicate lines together where ordinary
compression algorithms can work their magic. So if you tag each line with
its line number and which file it originally came from, then sort the lines
(skipping the tag), you get much better compression. But not nearly as
good as open-vcdiff, assuming you have the RAM to spare.
Using two dumps taken months apart on a slowly-changing database, it worked
fairly well:
cat 1.sql | pigz |wc -c
329833147
cat 2.sql | pigz |wc -c
353716759
cat 1.sql 2.sql | pigz |wc -c
683548147
sort -k2 <(perl -lne 'print "${.}a\t$_"' 1.sql) <(perl -lne 'print
"${.}b\t$_"' 2.sql) | pigz |wc -c
436350774
A certain file could be recovered by, for example:
zcat group_compressed.gz |sort -n|perl -lne 's/^(\d+b\t)// and print' >
2.sql2
There all kinds of short-comings here, of course, it was just a quick and
dirty proof of concept.
For now I think storage is cheap enough for what I need to do to make this
not worth fleshing it out any more.
Cheers,
Jeff
From | Date | Subject | |
---|---|---|---|
Next Message | ascot.moss@gmail.com | 2013-04-16 07:57:23 | Install PostgreSQL 9.2.4 to IBM Power System ppc64 |
Previous Message | Adrian Klaver | 2013-04-16 02:45:48 | Re: Role Authentication Failure |