Quick Links

Re: Proposal: Adding compression of temporary files

From:	Tomas Vondra <tomas(at)vondra(dot)me>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Filip Janus <fjanus(at)redhat(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Proposal: Adding compression of temporary files
Date:	2025-03-17 22:13:06
Message-ID:	7bcc0420-d457-4af5-a459-a4e5d929a665@vondra.me
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 3/15/25 11:40, Alexander Korotkov wrote:
> On Sun, Jan 5, 2025 at 1:43 AM Filip Janus <fjanus(at)redhat(dot)com> wrote:
>>
>> I apologize for multiple messages, but I found a small bug in the previous version.
>>
>> -Filip-
>
> Great, thank you for your work.
>
> I think the patches could use a pgindent run.
>
> I don't see a reason why the temp file compression method should be
> different from the wal compression methods, which we already have
> in-tree. Perhaps it would be nice to have a 0001 patch, which would
> abstract the compression methods we now have for wal into a separate
> file containing GUC option values and functions for
> compress/decompress. Then, 0002 would apply this to temporary file
> compression.
>

Not sure I understand the design you're proposing ...

AFAIK the WAL compression is not compressing the file data directly,
it's compressing backup blocks one by one, which then get written to WAL
as one piece of a record. So it's dealing with individual blocks, not
files, and we already have API to compress blocks (well, it's pretty
much the APIs for each compression method).

You're proposing abstracting that into a separate file - what would be
in that file? How would you abstract this to make it also useful for
file compression?

I can imagine a function CompressBufffer(method, dst, src, ...) wrapping
the various compression methods, unifying the error handling, etc. I can
imagine that, but that API is also limiting - e.g. how would that work
with stream compression, which seems irrelevant for WAL, but might be
very useful for tempfile compression.

IIRC this is mostly why we didn't try to do such generic API for pg_dump
compression, there's a local pg_dump-specific abstraction.

FWIW looking at the patch, I still don't quite understand why it needs
to correct the offset like this:

+ if (!file->compress)
+ file->curOffset -= (file->nbytes - file->pos);
+ else
+ if (nbytesOriginal - file->pos != 0)
+ /* curOffset must be corrected also if compression is
+ * enabled, nbytes was changed by compression but we
+ * have to use the original value of nbytes
+ */
+ file->curOffset-=bytestowrite;

It's not something introduced by the compression patch - the first part
is what we used to do before. But I find it a bit confusing - isn't it
mixing the correction of "logical file position" adjustment we did
before, and also the adjustment possibly needed due to compression?

In fact, isn't it going to fail if the code gets multiple loops in

while (wpos < file->nbytes)
{
...
}

because bytestowrite will be the value from the last loop? I haven't
tried, but I guess writing wide tuples (more than 8k) might fail.

regards

--
Tomas Vondra

In response to

Re: Proposal: Adding compression of temporary files at 2025-03-15 10:40:30 from Alexander Korotkov

Responses

Re: Proposal: Adding compression of temporary files at 2025-03-18 11:42:17 from Alexander Korotkov
Re: Proposal: Adding compression of temporary files at 2025-03-28 08:23:13 from Filip Janus

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Matthias van de Meent	2025-03-17 22:51:25	Re: Adding skip scan (including MDAM style range skip scan) to nbtree
Previous Message	Jelte Fennema-Nio	2025-03-17 22:11:15	Re: Next commitfest app release is planned for March 18th