Re: Add LZ4 compression in pg_dump

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: gkokolatos(at)pm(dot)me, shiy(dot)fnst(at)fujitsu(dot)com, Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Rachel Heaton <rachelmheaton(at)gmail(dot)com>
Subject: Re: Add LZ4 compression in pg_dump
Date: 2023-02-28 23:58:34
Message-ID: 20230228235834.GC30529@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 23, 2023 at 09:24:46PM +0100, Tomas Vondra wrote:
> On 2/23/23 16:26, Tomas Vondra wrote:
> > Thanks for v30 with the updated commit messages. I've pushed 0001 after
> > fixing a comment typo and removing (I think) an unnecessary change in an
> > error message.
> >
> > I'll give the buildfarm a bit of time before pushing 0002 and 0003.
> >
>
> I've now pushed 0002 and 0003, after minor tweaks (a couple typos etc.),
> and marked the CF entry as committed. Thanks for the patch!

I found that e9960732a broke writing of empty gzip-compressed data,
specifically LOs. pg_dump succeeds, but then the restore fails:

postgres=# SELECT lo_create(1234);
lo_create | 1234

$ time ./src/bin/pg_dump/pg_dump -h /tmp -d postgres -Fc |./src/bin/pg_dump/pg_restore -f /dev/null -v
pg_restore: implied data-only restore
pg_restore: executing BLOB 1234
pg_restore: processing BLOBS
pg_restore: restoring large object with OID 1234
pg_restore: error: could not uncompress data: (null)

The inline patch below fixes it, but you won't be able to apply it
directly, as it's on top of other patches which rename the functions
back to "Zlib" and rearranges the functions to their original order, to
allow running:

git diff --diff-algorithm=minimal -w e9960732a~:./src/bin/pg_dump/compress_io.c ./src/bin/pg_dump/compress_gzip.c

The current function order avoids 3 lines of declarations, but it's
obviously pretty useful to be able to run that diff command. I already
argued for not calling the functions "Gzip" on the grounds that the name
was inaccurate.

I'd want to create an empty large object in src/test/sql/largeobject.sql
to exercise this tested during pgupgrade. But unfortunately that
doesn't use -Fc, so this isn't hit. Empty input is an important enough
test case to justify a tap test, if there's no better way.

diff --git a/src/bin/pg_dump/compress_gzip.c b/src/bin/pg_dump/compress_gzip.c
index f3f5e87c9a8..68f3111b2fe 100644
--- a/src/bin/pg_dump/compress_gzip.c
+++ b/src/bin/pg_dump/compress_gzip.c
@@ -55,6 +55,32 @@ InitCompressorZlib(CompressorState *cs,
gzipcs = (ZlibCompressorState *) pg_malloc0(sizeof(ZlibCompressorState));

cs->private_data = gzipcs;
+
+ if (cs->writeF)
+ {
+ z_streamp zp;
+ zp = gzipcs->zp = (z_streamp) pg_malloc0(sizeof(z_stream));
+ zp->zalloc = Z_NULL;
+ zp->zfree = Z_NULL;
+ zp->opaque = Z_NULL;
+
+ /*
+ * outsize is the buffer size we tell zlib it can output to. We
+ * actually allocate one extra byte because some routines want to append a
+ * trailing zero byte to the zlib output.
+ */
+
+ gzipcs->outbuf = pg_malloc(ZLIB_OUT_SIZE + 1);
+ gzipcs->outsize = ZLIB_OUT_SIZE;
+
+ if (deflateInit(gzipcs->zp, cs->compression_spec.level) != Z_OK)
+ pg_fatal("could not initialize compression library: %s",
+ zp->msg);
+
+ /* Just be paranoid - maybe End is called after Start, with no Write */
+ zp->next_out = gzipcs->outbuf;
+ zp->avail_out = gzipcs->outsize;
+ }
}

static void
@@ -63,7 +89,7 @@ EndCompressorZlib(ArchiveHandle *AH, CompressorState *cs)
ZlibCompressorState *gzipcs = (ZlibCompressorState *) cs->private_data;
z_streamp zp;

- if (gzipcs->zp)
+ if (cs->writeF != NULL)
{
zp = gzipcs->zp;
zp->next_in = NULL;
@@ -131,29 +157,6 @@ WriteDataToArchiveZlib(ArchiveHandle *AH, CompressorState *cs,
const void *data, size_t dLen)
{
ZlibCompressorState *gzipcs = (ZlibCompressorState *) cs->private_data;
- z_streamp zp;
-
- if (!gzipcs->zp)
- {
- zp = gzipcs->zp = (z_streamp) pg_malloc(sizeof(z_stream));
- zp->zalloc = Z_NULL;
- zp->zfree = Z_NULL;
- zp->opaque = Z_NULL;
-
- /*
- * outsize is the buffer size we tell zlib it can output to. We
- * actually allocate one extra byte because some routines want to
- * append a trailing zero byte to the zlib output.
- */
- gzipcs->outbuf = pg_malloc(ZLIB_OUT_SIZE + 1);
- gzipcs->outsize = ZLIB_OUT_SIZE;
-
- if (deflateInit(zp, cs->compression_spec.level) != Z_OK)
- pg_fatal("could not initialize compression library: %s", zp->msg);
-
- zp->next_out = gzipcs->outbuf;
- zp->avail_out = gzipcs->outsize;
- }

gzipcs->zp->next_in = (void *) unconstify(void *, data);
gzipcs->zp->avail_in = dLen;

Attachment Content-Type Size
0001-Rename-functions-structures-comments-which-don-t-use.patch text/x-diff 8.5 KB
0002-also-rearrange-the-functions-to-their-original-order.patch text/x-diff 4.6 KB
0003-pg_dump-call-deflateInit-in-Init-rather-than-in-Writ.patch text/x-diff 3.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2023-03-01 00:09:57 Re: Doc update for pg_stat_statements normalization
Previous Message Jacob Champion 2023-02-28 23:49:07 Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert