From: | Justin Pryzby <pryzby(at)telsasoft(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Cc: | gkokolatos(at)pm(dot)me, Michael Paquier <michael(at)paquier(dot)xyz>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Dipesh Pandit <dipesh(dot)pandit(at)gmail(dot)com>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Jacob Champion <jchampion(at)timescale(dot)com> |
Subject: | Re: zstd compression for pg_dump |
Date: | 2023-03-05 17:47:58 |
Message-ID: | 20230305174758.GA11636@telsasoft.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, Feb 25, 2023 at 07:22:27PM -0600, Justin Pryzby wrote:
> On Fri, Feb 24, 2023 at 01:18:40PM -0600, Justin Pryzby wrote:
> > This is a draft patch - review is welcome and would help to get this
> > ready to be considererd for v16, if desired.
> >
> > I'm going to add this thread to the old CF entry.
> > https://commitfest.postgresql.org/31/2888/
>
> This resolves cfbot warnings: windows and cppcheck.
> And refactors zstd routines.
> And updates docs.
> And includes some fixes for earlier patches that these patches conflicts
> with/depends on.
This rebases over the TAP and doc fixes to LZ4.
And adds necessary ENV to makefile and meson.
And adds an annoying boilerplate header.
And removes supports_compression(), which is what I think Tomas meant
when referring to "annoying unsupported cases".
And updates zstd.c: fix an off-by-one, allocate in init depending on
readF/writeF, do not reset the input buffer on each iteration, and show
parameter name in errors.
I'd appreciate help checking that this is doing the right things and
works correctly with zstd threaded workers. The zstd API says: "use one
different context per thread for parallel execution" and "For parallel
execution, use one separate ZSTD_CStream per thread".
https://github.com/facebook/zstd/blob/dev/lib/zstd.h
I understand that to mean that, if pg_dump *itself* were using threads,
then each thread would need to call ZSTD_createCStream(). pg_dump isn't
threaded, so there's nothing special needed, right?
Except that, under windows, pg_dump -Fd -j actually uses threads instead
of forking. I *think* that's still safe, since the pgdump threads are
created *before* calling zstd functions (see _PrintTocData and
_StartData of the custom and directory formats), so it happens naturally
that there's a separate zstd stream for each thread of pgdump.
--
Justin
Attachment | Content-Type | Size |
---|---|---|
0001-pg_dump-zstd-compression.patch | text/x-diff | 28.8 KB |
0002-TMP-pg_dump-use-Zstd-by-default-for-CI-only.patch | text/x-diff | 2.2 KB |
0003-zstd-support-long-distance-mode-in-pg_dump-basebacku.patch | text/x-diff | 13.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2023-03-05 17:54:31 | Re: Date-Time dangling unit fix |
Previous Message | Tom Lane | 2023-03-05 16:51:05 | Re: [Question] Similar Cost but variable execution time in sort |