From: | Justin Pryzby <pryzby(at)telsasoft(dot)com> |
---|---|
To: | Jacob Champion <jchampion(at)timescale(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org, gkokolatos(at)pm(dot)me, Michael Paquier <michael(at)paquier(dot)xyz>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Dipesh Pandit <dipesh(dot)pandit(at)gmail(dot)com>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com> |
Subject: | Re: zstd compression for pg_dump |
Date: | 2023-03-03 18:55:46 |
Message-ID: | 20230303185546.GE12850@telsasoft.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Mar 03, 2023 at 10:32:53AM -0800, Jacob Champion wrote:
> On Sat, Feb 25, 2023 at 5:22 PM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> > This resolves cfbot warnings: windows and cppcheck.
> > And refactors zstd routines.
> > And updates docs.
> > And includes some fixes for earlier patches that these patches conflicts
> > with/depends on.
>
> This'll need a rebase (cfbot took a while to catch up).
Soon.
> The patchset includes basebackup modifications, which are part of a
> different CF entry; was that intended?
Yes, it's intentional - if zstd:long mode were to be merged first, then
this patch should include long mode from the start.
Or, if pgdump+zstd were merged first, then long mode could be added to
both places.
> I tried this on a local, 3.5GB, mostly-text table (from the UK Price
Thanks for looking. If your zstd library is compiled with thread
support, could you also try with :workers=N ? I believe this is working
correctly, but I'm going to ask for help verifying that...
It'd be especially useful to test under windows, where pgdump/restore
use threads instead of forking... If you have a windows environment but
not set up for development, I think it's possible to get cirrusci to
compile a patch for you and then retrieve the binaries provided as an
"artifact" (credit/blame for this idea should be directed to Thomas
Munro).
> With this particular dataset, I don't see much improvement with
> zstd:long.
Yeah. I this could be because either 1) you already got very good
comprssion without looking at more data; and/or 2) the neighboring data
is already very similar, maybe equally or more similar, than the further
data, from which there's nothing to gain.
> (At nearly double the CPU time, I get a <1% improvement in
> compression size.) I assume it's heavily data dependent, but from the
> notes on --long [2] it seems like they expect you to play around with
> the window size to further tailor it to your data. Does it make sense
> to provide the long option without the windowLog parameter?
I don't want to start exposing lots of fine-granined parameters at this
point. In the immediate case, it looks like it may require more than
just adding another parameter:
Note: If windowLog is set to larger than 27,
--long=windowLog or --memory=windowSize needs to be passed to the
decompressor.
--
Justin
From | Date | Subject | |
---|---|---|---|
Next Message | Jeroen Vermeulen | 2023-03-03 19:30:37 | Re: libpq: PQgetCopyData() and allocation overhead |
Previous Message | Tom Lane | 2023-03-03 18:53:30 | Re: libpq: PQgetCopyData() and allocation overhead |