Re: Compress ReorderBuffer spill files using LZ4

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Julien Tachoires <julmon(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Compress ReorderBuffer spill files using LZ4
Date: 2024-07-16 12:52:51
Message-ID: CAA4eK1JwbwjfL31oNxz-jS=XARvf4WNFONoW8b3Erv8RAvfURA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jul 16, 2024 at 12:58 AM Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
> On 7/15/24 20:50, Julien Tachoires wrote:
> > Hi,
> >
> > Le ven. 7 juin 2024 à 06:18, Julien Tachoires <julmon(at)gmail(dot)com> a écrit :
> >>
> >> Le ven. 7 juin 2024 à 05:59, Tomas Vondra
> >> <tomas(dot)vondra(at)enterprisedb(dot)com> a écrit :
> >>>
> >>> On 6/6/24 12:58, Julien Tachoires wrote:
> >>>> ...
> >>>>
> >>>> When compiled with LZ4 support (--with-lz4), this patch enables data
> >>>> compression/decompression of these temporary files. Each transaction
> >>>> change that must be written on disk (ReorderBufferDiskChange) is now
> >>>> compressed and encapsulated in a new structure.
> >>>>
> >>>
> >>> I'm a bit confused, but why tie this to having lz4? Why shouldn't this
> >>> be supported even for pglz, or whatever algorithms we add in the future?
> >>
> >> That's right, reworking this patch in that sense.
> >
> > Please find a new version of this patch adding support for LZ4, pglz
> > and ZSTD. It introduces the new GUC logical_decoding_spill_compression
> > which is used to set the compression method. In order to stay aligned
> > with the other server side GUCs related to compression methods
> > (wal_compression, default_toast_compression), the compression level is
> > not exposed to users.
> >
>
> Sounds reasonable. I wonder if it might be useful to allow specifying
> the compression level in those places, but that's clearly not something
> this patch needs to do.
>
> > The last patch of this set is still in WIP, it adds the machinery
> > required for setting the compression methods as a subscription option:
> > CREATE SUBSCRIPTION ... WITH (spill_compression = ...);
> > I think there is a major problem with this approach: the logical
> > decoding context is tied to one replication slot, but multiple
> > subscriptions can use the same replication slot. How should this work
> > if 2 subscriptions want to use the same replication slot but different
> > compression methods?
> >
>
> Do we really support multiple subscriptions sharing the same slot? I
> don't think we do, but maybe I'm missing something.
>
> > At this point, compression is only available for the changes spilled
> > on disk. It is still not clear to me if the compression of data
> > transiting through the streaming protocol should be addressed by this
> > patch set or by another one. Thought ?
> >
>
> I'd stick to only compressing the data spilled to disk. It might be
> useful to compress the streamed data too, but why shouldn't we compress
> the regular (non-streamed) transactions too? Yeah, it's more efficient
> to compress larger chunks, but we can fit quite large transactions into
> logical_decoding_work_mem without spilling.
>
> FWIW I'd expect that to be handled at the libpq level - there's already
> a patch for that, but I haven't checked if it would handle this. But
> maybe more importantly, I think compressing streamed data might need to
> handle some sort of negotiation of the compression algorithm, which
> seems fairly complex.
>
> To conclude, I'd leave this out of scope for this patch.
>

Your point sounds reasonable to me. OTOH, if we want to support
compression for spill case then shouldn't there be a question how
frequent such an option would be required? Users currently have an
option to stream large transactions for parallel apply or otherwise in
which case no spilling is required. I feel sooner or later we will
make such behavior (streaming=parallel) as default, and then spilling
should happen in very few cases. Is it worth adding this new option
and GUC if that is true?

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Aleksander Alekseev 2024-07-16 13:09:44 [PATCH] Refactor pqformat.{c,h} and protocol.h
Previous Message Tom Lane 2024-07-16 12:40:47 Re: Why is 'use_alias' hardcoded to true in deparseFromExprForRel() for some cases