Re: Proposal: Adding compression of temporary files

From: Filip Janus <fjanus(at)redhat(dot)com>
To: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(at)vondra(dot)me>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal: Adding compression of temporary files
Date: 2025-04-25 21:54:00
Message-ID: CAFjYY+LJ5_j82PrSxm2RjM0OjwfN9smJ1VKKMOka-25Jg-0ofg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The latest rebase.

-Filip-

út 22. 4. 2025 v 9:17 odesílatel Filip Janus <fjanus(at)redhat(dot)com> napsal:

> Since the patch was prepared months ago, it needs to be rebased.
>
> -Filip-
>
>
> ne 13. 4. 2025 v 21:53 odesílatel Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
> napsal:
>
>> > On Fri, Mar 28, 2025 at 09:23:13AM GMT, Filip Janus wrote:
>> > > + else
>> > > + if (nbytesOriginal - file->pos != 0)
>> > > + /* curOffset must be corrected also if compression is
>> > > + * enabled, nbytes was changed by compression but we
>> > > + * have to use the original value of nbytes
>> > > + */
>> > > + file->curOffset-=bytestowrite;
>> > >
>> > > It's not something introduced by the compression patch - the first
>> part
>> > > is what we used to do before. But I find it a bit confusing - isn't it
>> > > mixing the correction of "logical file position" adjustment we did
>> > > before, and also the adjustment possibly needed due to compression?
>> > >
>> > > In fact, isn't it going to fail if the code gets multiple loops in
>> > >
>> > > while (wpos < file->nbytes)
>> > > {
>> > > ...
>> > > }
>> > >
>> > > because bytestowrite will be the value from the last loop? I haven't
>> > > tried, but I guess writing wide tuples (more than 8k) might fail.
>> > >
>> >
>> > I will definitely test it with larger tuples than 8K.
>> >
>> > Maybe I don't understand it correctly,
>> > the adjustment is performed in the case that file->nbytes and file->pos
>> > differ.
>> > So it must persist also if we are working with the compressed data, but
>> the
>> > problem is that data stored and compressed on disk has different sizes
>> than
>> > data incoming uncompressed ones, so what should be the correction value.
>> > By debugging, I realized that the correction should correspond to the
>> size
>> > of
>> > bytestowrite from the last iteration of the loop.
>>
>> I agree, this looks strange. If the idea is to set curOffset to its
>> original value + pos, and the original value was advanced multiple times
>> by bytestowrite, it seems incorrect to adjust it by bytestowrite, it
>> seems incorrect to adjust it only once. From what I see current tests do
>> not exercise a case where the while will get multiple loops, so it looks
>> fine.
>>
>> At the same time maybe I'm missing something, but how exactly such test
>> for 8k tuples and multiple loops in the while block should look like?
>> E.g. when I force a hash join on a table with a single wide text column,
>> the minimal tuple that is getting written to the temporary file still
>> has rather small length, I assume due to toasting. Is there some other
>> way to achieve that?
>>
>>

Attachment Content-Type Size
0002-Add-test-for-temporary-files-compression-this-commit.patch application/octet-stream 126.1 KB
0001-This-commit-adds-support-for-temporary-files-compres.patch application/octet-stream 15.7 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Sami Imseih 2025-04-25 22:31:13 Re: [BUG] temporary file usage report with extended protocol and unnamed portals
Previous Message Tom Lane 2025-04-25 21:44:05 Sanding down some edge cases for PL/pgSQL reserved words