Re: Add LZ4 compression in pg_dump

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: gkokolatos(at)pm(dot)me, Michael Paquier <michael(at)paquier(dot)xyz>, shiy(dot)fnst(at)fujitsu(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, Rachel Heaton <rachelmheaton(at)gmail(dot)com>
Subject: Re: Add LZ4 compression in pg_dump
Date: 2023-03-02 18:27:00
Message-ID: cb3d5dbd-988e-7628-a3b2-d3b4a66729d5@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/2/23 18:18, Justin Pryzby wrote:
> On Wed, Mar 01, 2023 at 05:20:05PM +0100, Tomas Vondra wrote:
>> On 2/25/23 15:05, Justin Pryzby wrote:
>>> On Fri, Feb 24, 2023 at 11:02:14PM -0600, Justin Pryzby wrote:
>>>> I have some fixes (attached) and questions while polishing the patch for
>>>> zstd compression. The fixes are small and could be integrated with the
>>>> patch for zstd, but could be applied independently.
>>>
>>> One more - WriteDataToArchiveGzip() says:
>>>
>>> + if (cs->compression_spec.level == 0)
>>> + pg_fatal("requested to compress the archive yet no level was specified");
>>>
>>> That was added at e9960732a.
>>>
>>> But if you specify gzip:0, the compression level is already enforced by
>>> validate_compress_specification(), before hitting gzip.c:
>>>
>>> | pg_dump: error: invalid compression specification: compression algorithm "gzip" expects a compression level between 1 and 9 (default at -1)
>>>
>>> 5e73a6048 intended that to work as before, and you *can* specify -Z0:
>>>
>>> The change is backward-compatible, hence specifying only an integer
>>> leads to no compression for a level of 0 and gzip compression when the
>>> level is greater than 0.
>>>
>>> $ time ./src/bin/pg_dump/pg_dump -h /tmp regression -t int8_tbl -Fp --compress 0 |file -
>>> /dev/stdin: ASCII text
>>
>> FWIW I agree we should make this backwards-compatible - accept "0" and
>> treat it as no compression.
>>
>> Georgios, can you prepare a patch doing that?
>
> I think maybe Tomas misunderstood. What I was trying to say is that -Z
> 0 *is* accepted to mean no compression. This part wasn't quoted, but I
> said:
>

Ah, I see. Well, I also tried but with "-Z gzip:0" (and not -Z 0), and
that does fail:

error: invalid compression specification: compression algorithm "gzip"
expects a compression level between 1 and 9 (default at -1)

It's a bit weird these two cases behave differently, when both translate
to the same default compression method (gzip).

>> Right now, I think that pg_fatal in gzip.c is dead code - that was first
>> added in the patch version sent on 21 Dec 2022.
>
> If you run the diff command that I've been talking about, you'll see
> that InitCompressorZlib was almost unchanged - e9960732 is essentially a
> refactoring. I don't think it's desirable to add a pg_fatal() in a
> function that's otherwise nearly-unchanged. The fact that it's
> nearly-unchanged is a good thing: it simplifies reading of what changed.
> If someone wants to add a pg_fatal() in that code path, it'd be better
> done in its own commit, with a separate message explaining the change.
>
> If you insist on changing anything here, you might add an assertion (as
> you said earlier) along with a comment like
> /* -Z 0 uses the "None" compressor rather than zlib with no compression */
>

Yeah, a comment would be helpful.

Also, after thinking about it a bit more maybe having the unreachable
pg_fatal() is not a good thing, as it will just confuse people (I'd
certainly assume having such check means there's a way in which case it
might trigger.). Maybe an assert would be better?

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ankit Kumar Pandey 2023-03-02 18:32:21 Re: Sort optimizations: Making in-memory sort cache-aware
Previous Message Pavel Borisov 2023-03-02 18:17:19 Re: POC: Lock updated tuples in tuple_update() and tuple_delete()