From: | Eduardo Morras <emorrasg(at)yahoo(dot)es> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Backup advice |
Date: | 2013-04-16 11:50:13 |
Message-ID: | 20130416135013.94873e45a8ee1365f28d2554@yahoo.es |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Mon, 15 Apr 2013 19:54:15 -0700
Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> On Tue, Apr 9, 2013 at 3:05 AM, Eduardo Morras
> <emorrasg(at)yahoo(dot)es<javascript:_e({}, 'cvml', 'emorrasg(at)yahoo(dot)es');>
> > wrote:
>
> > On Mon, 8 Apr 2013 10:40:16 -0500
> > Shaun Thomas <sthomas(at)optionshouse(dot)com <javascript:_e({}, 'cvml',
> > 'sthomas(at)optionshouse(dot)com');>> wrote:
> >
> > >
> > > Anyone else?
> > >
> >
> > If his db has low inserts/updates/deletes he can use diff between pg_dumps
> > (with default -Fp) before compressing.
> >
>
> Most "diff" implementations will read the entirety of both files into
> memory, so may not work well with 200GB of data, unless it is broken into a
> large number of much smaller files.
>
> open-vcdiff only reads one of the files into memory, but I couldn't really
> figure out what happens memory-wise when you try to undo the resulting
> patch, the documentation is a bit mysterious.
>
> xdelta3 will "work" on streamed files of unlimited size, but it doesn't
> work very well unless the files fit in memory, or have the analogous data
> in the same order between the two files.
I use for my 12-13 GB dump files:
git diff -p 1.sql 2.sql > diff.patch
It uses 4MB for firts phase and upto 140MB on last one and makes a patch file that can be recovered with:
patch 1.sql < diff.patch > 2.sql
or using git apply.
> A while ago I did some attempts to "co-compress" dump files, based on the
> notion that the pg_dump text format does not have \n within records so it
> is sortable as ordinary text, and that usually tables have their "stable"
> columns, like a pk, near the beginning of the table and volatile columns
> near the end, so that sorting the lines of several dump files together will
> gather replicate or near-replicate lines together where ordinary
> compression algorithms can work their magic. So if you tag each line with
> its line number and which file it originally came from, then sort the lines
> (skipping the tag), you get much better compression. But not nearly as
> good as open-vcdiff, assuming you have the RAM to spare.
>
> Using two dumps taken months apart on a slowly-changing database, it worked
> fairly well:
>
> cat 1.sql | pigz |wc -c
> 329833147
>
> cat 2.sql | pigz |wc -c
> 353716759
>
> cat 1.sql 2.sql | pigz |wc -c
> 683548147
>
> sort -k2 <(perl -lne 'print "${.}a\t$_"' 1.sql) <(perl -lne 'print
> "${.}b\t$_"' 2.sql) | pigz |wc -c
> 436350774
>
> A certain file could be recovered by, for example:
>
> zcat group_compressed.gz |sort -n|perl -lne 's/^(\d+b\t)// and print' >
> 2.sql2
Be careful, some z* utils decompress the whole file on /tmp (zdiff).
> There all kinds of short-comings here, of course, it was just a quick and
> dirty proof of concept.
A nice one !
> For now I think storage is cheap enough for what I need to do to make this
> not worth fleshing it out any more.
>
> Cheers,
>
> Jeff
--- ---
Eduardo Morras <emorrasg(at)yahoo(dot)es>
From | Date | Subject | |
---|---|---|---|
Next Message | Adrian Klaver | 2013-04-16 13:46:50 | Re: Role Authentication Failure |
Previous Message | Tatsuo Ishii | 2013-04-16 08:02:08 | Re: Install PostgreSQL 9.2.4 to IBM Power System ppc64 |