Re: Add LZ4 compression in pg_dump

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: gkokolatos(at)pm(dot)me, Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Rachel Heaton <rachelmheaton(at)gmail(dot)com>
Subject: Re: Add LZ4 compression in pg_dump
Date: 2023-01-25 17:28:49
Message-ID: 15b255d0-8868-a020-f6a9-7324f82a28b6@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 1/25/23 16:37, gkokolatos(at)pm(dot)me wrote:
>
>
>
>
>
> ------- Original Message -------
> On Wednesday, January 25th, 2023 at 2:42 AM, Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
>
>
>>
>>
>> On Tue, Jan 24, 2023 at 03:56:20PM +0000, gkokolatos(at)pm(dot)me wrote:
>>
>>> On Monday, January 23rd, 2023 at 7:00 PM, Justin Pryzby pryzby(at)telsasoft(dot)com wrote:
>>>
>>>> On Mon, Jan 23, 2023 at 05:31:55PM +0000, gkokolatos(at)pm(dot)me wrote:
>>>>
>>>>> Please find attached v23 which reintroduces the split.
>>>>>
>>>>> 0001 is reworked to have a reduced footprint than before. Also in an attempt
>>>>> to facilitate the readability, 0002 splits the API's and the uncompressed
>>>>> implementation in separate files.
>>>>
>>>> Thanks for updating the patch. Could you address the review comments I
>>>> sent here ?
>>>> https://www.postgresql.org/message-id/20230108194524.GA27637%40telsasoft.com
>>>
>>> Please find v24 attached.
>>
>>
>> Thanks for updating the patch.
>>
>> In 001, RestoreArchive() does:
>>
>>> -#ifndef HAVE_LIBZ
>>> - if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP &&
>>> - AH->PrintTocDataPtr != NULL)
>>> + supports_compression = false;
>>> + if (AH->compression_spec.algorithm == PG_COMPRESSION_NONE ||
>>> + AH->compression_spec.algorithm == PG_COMPRESSION_GZIP)
>>> + supports_compression = true;
>>> +
>>> + if (AH->PrintTocDataPtr != NULL)
>>> {
>>> for (te = AH->toc->next; te != AH->toc; te = te->next)
>>> {
>>> if (te->hadDumper && (te->reqs & REQ_DATA) != 0)
>>> - pg_fatal("cannot restore from compressed archive (compression not supported in this installation)");
>>> + {
>>> +#ifndef HAVE_LIBZ
>>> + if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP)
>>> + supports_compression = false;
>>> +#endif
>>> + if (supports_compression == false)
>>> + pg_fatal("cannot restore from compressed archive (compression not supported in this installation)");
>>> + }
>>> }
>>> }
>>> -#endif
>>
>>
>> This first checks if the algorithm is implemented, and then checks if
>> the algorithm is supported by the current build - that confused me for a
>> bit. It seems unnecessary to check for unimplemented algorithms before
>> looping. That also requires referencing both GZIP and LZ4 in two
>> places.
>
> I am not certain that it is unnecessary, at least not in the way that is
> described. The idea is that new compression methods can be added, without
> changing the archive's version number. It is very possible that it is
> requested to restore an archive compressed with a method not implemented
> in the current binary. The first check takes care of that and sets
> supports_compression only for the supported versions. It is possible to
> enter the loop with supports_compression already set to false, for example
> because the archive was compressed with ZSTD, triggering the fatal error.
>
> Of course, one can throw the error before entering the loop, yet I think
> that it does not help the readability of the code. IMHO it is easier to
> follow if the error is thrown once during that check.
>

Actually, I don't understand why 0001 moves the check into the loop. I
mean, why not check HAVE_LIBZ before the loop?

>>
>> I think it could be written to avoid the need to change for added
>> compression algorithms:
>>
>> + if (te->hadDumper && (te->reqs & REQ_DATA) != 0)
>>
>> + {
>> + /* Check if the compression algorithm is supported */
>> + pg_compress_specification spec;
>> + parse_compress_specification(AH->compression_spec.algorithm, NULL, &spec);
>>
>> + if (spec->parse_error != NULL)
>>
>> + pg_fatal(spec->parse_error);
>>
>> + }
>
> I am not certain how that would work in the example with ZSTD above.
> If I am not wrong, parse_compress_specification() will not throw an error
> if the codebase supports ZSTD, yet this specific pg_dump binary will not
> support it because ZSTD is not implemented. parse_compress_specification()
> is not aware of that and should not be aware of it, should it?
>

Not sure. What happens in a similar situation now? That is, when trying
to deal with an archive gzip-compressed in a build without libz?

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2023-01-25 17:34:00 Re: CREATE ROLE bug?
Previous Message Robert Haas 2023-01-25 17:21:14 Re: CREATE ROLE bug?