From: | Filip Janus <fjanus(at)redhat(dot)com> |
---|---|
To: | Tomas Vondra <tomas(at)vondra(dot)me> |
Cc: | Alexander Korotkov <aekorotkov(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Proposal: Adding compression of temporary files |
Date: | 2025-03-28 08:23:13 |
Message-ID: | CAFjYY+Lhw2tvsiC8_Dbx+K9dMaK=myT5Cck2=8FgLUW5f9Hp-A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
-Filip-
po 17. 3. 2025 v 23:13 odesílatel Tomas Vondra <tomas(at)vondra(dot)me> napsal:
> On 3/15/25 11:40, Alexander Korotkov wrote:
> > On Sun, Jan 5, 2025 at 1:43 AM Filip Janus <fjanus(at)redhat(dot)com> wrote:
> >>
> >> I apologize for multiple messages, but I found a small bug in the
> previous version.
> >>
> >> -Filip-
> >
> > Great, thank you for your work.
> >
> > I think the patches could use a pgindent run.
> >
> > I don't see a reason why the temp file compression method should be
> > different from the wal compression methods, which we already have
> > in-tree. Perhaps it would be nice to have a 0001 patch, which would
> > abstract the compression methods we now have for wal into a separate
> > file containing GUC option values and functions for
> > compress/decompress. Then, 0002 would apply this to temporary file
> > compression.
> >
>
> Not sure I understand the design you're proposing ...
>
> AFAIK the WAL compression is not compressing the file data directly,
> it's compressing backup blocks one by one, which then get written to WAL
> as one piece of a record. So it's dealing with individual blocks, not
> files, and we already have API to compress blocks (well, it's pretty
> much the APIs for each compression method).
>
> You're proposing abstracting that into a separate file - what would be
> in that file? How would you abstract this to make it also useful for
> file compression?
>
> I can imagine a function CompressBufffer(method, dst, src, ...) wrapping
> the various compression methods, unifying the error handling, etc. I can
> imagine that, but that API is also limiting - e.g. how would that work
> with stream compression, which seems irrelevant for WAL, but might be
> very useful for tempfile compression.
>
> IIRC this is mostly why we didn't try to do such generic API for pg_dump
> compression, there's a local pg_dump-specific abstraction.
>
>
> FWIW looking at the patch, I still don't quite understand why it needs
> to correct the offset like this:
>
> + if (!file->compress)
> + file->curOffset -= (file->nbytes - file->pos);
>
This line of code is really confusing to me, and I wasn't able to fully
understand why it must be done,
but I experimented with it, and if I remember correctly, it's triggered
(the result differs from 0) mainly in the last call of
BufFileDumpBuffer function for a single data chunk.
> + else
> + if (nbytesOriginal - file->pos != 0)
> + /* curOffset must be corrected also if compression is
> + * enabled, nbytes was changed by compression but we
> + * have to use the original value of nbytes
> + */
> + file->curOffset-=bytestowrite;
>
> It's not something introduced by the compression patch - the first part
> is what we used to do before. But I find it a bit confusing - isn't it
> mixing the correction of "logical file position" adjustment we did
> before, and also the adjustment possibly needed due to compression?
>
> In fact, isn't it going to fail if the code gets multiple loops in
>
> while (wpos < file->nbytes)
> {
> ...
> }
>
> because bytestowrite will be the value from the last loop? I haven't
> tried, but I guess writing wide tuples (more than 8k) might fail.
>
I will definitely test it with larger tuples than 8K.
Maybe I don't understand it correctly,
the adjustment is performed in the case that file->nbytes and file->pos
differ.
So it must persist also if we are working with the compressed data, but the
problem is that data stored and compressed on disk has different sizes than
data incoming uncompressed ones, so what should be the correction value.
By debugging, I realized that the correction should correspond to the size
of
bytestowrite from the last iteration of the loop.
> regards
>
> --
> Tomas Vondra
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Евгений Горбанев | 2025-03-28 08:43:52 | Re: Buffer overflow in zic |
Previous Message | Alexander Pyhalov | 2025-03-28 08:19:30 | Re: MergeAppend could consider sorting cheapest child path |