Re: Compress ReorderBuffer spill files using LZ4

From: Julien Tachoires <julmon(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Compress ReorderBuffer spill files using LZ4
Date: 2024-07-19 22:05:07
Message-ID: CAFEQCbH6gj3QHVjhkf7o=FcxCx4EZTJggqeODfsWa+OcpqJ=Tg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Le mer. 17 juil. 2024 à 02:12, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> a écrit :
>
> On Tue, Jul 16, 2024 at 7:31 PM Tomas Vondra
> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
> >
> > On 7/16/24 14:52, Amit Kapila wrote:
> > > On Tue, Jul 16, 2024 at 12:58 AM Tomas Vondra
> > > <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
> > >>
> > >> FWIW I'd expect that to be handled at the libpq level - there's already
> > >> a patch for that, but I haven't checked if it would handle this. But
> > >> maybe more importantly, I think compressing streamed data might need to
> > >> handle some sort of negotiation of the compression algorithm, which
> > >> seems fairly complex.
> > >>
> > >> To conclude, I'd leave this out of scope for this patch.
> > >>
> > >
> > > Your point sounds reasonable to me. OTOH, if we want to support
> > > compression for spill case then shouldn't there be a question how
> > > frequent such an option would be required? Users currently have an
> > > option to stream large transactions for parallel apply or otherwise in
> > > which case no spilling is required. I feel sooner or later we will
> > > make such behavior (streaming=parallel) as default, and then spilling
> > > should happen in very few cases. Is it worth adding this new option
> > > and GUC if that is true?
> > >
> >
> > I don't know, but streaming is 'off' by default, and I'm not aware of
> > any proposals to change this, so when you suggest "sooner or later"
> > we'll change this, I'd probably bet on "later or never".
> >
> > I haven't been following the discussions about parallel apply very
> > closely, but my impression from dealing with similar stuff in other
> > tools is that it's rather easy to run into issues with some workloads,
> > which just makes me more skeptical about "streamin=parallel" by default.
> > But as I said, I'm out of the loop so I may be wrong ...
> >
>
> It is difficult to say whether enabling it by default will have issues
> or not but till now we haven't seen many reports for the streaming =
> 'parallel' option. It could be due to the reason that not many people
> enable it in their workloads. We can probably find out by enabling it
> by default.
>
> > As for whether the GUC is needed, I don't know. I guess we might do the
> > same thing we do for streaming - we don't have a GUC to enable this, but
> > we default to 'off' and the client has to request that when opening the
> > replication connection. So it'd be specified at the subscription level,
> > more or less.
> >
> > But then how would we specify compression for cases that invoke decoding
> > directly by pg_logical_slot_get_changes()? Through options?
> >
>
> If we decide to go with this then yeah that is one way, another
> possibility is to make it a slot's property, so we can allow to take a
> new parameter in pg_create_logical_replication_slot(). We can even
> think of inventing a new API to alter the slot's properties if we
> decide to go this route.

Please find a new version of this patch set. The compression method is
now set on subscriber level via CREATE SUBSCRIPTION or ALTER
SUBSCRIPTION and can be passed to
pg_logical_slot_get_changes()/pg_logical_slot_get_binary_changes()
through the option spill_compression.

> > BTW if we specify this at subscription level, will it be possible to
> > change the compression method?
> >
>
> This needs analysis but offhand I can't see the problems with it.

I didn't notice any issue, the compression method can be changed even
when a decoding is in progress, in this case, the replication worker
restart due to parameter change.

JT

Attachment Content-Type Size
v3-0002-Fix-spill_bytes-counter.patch application/octet-stream 2.8 KB
v3-0004-Compress-ReorderBuffer-spill-files-using-ZSTD.patch application/octet-stream 14.5 KB
v3-0003-Compress-ReorderBuffer-spill-files-using-PGLZ.patch application/octet-stream 4.1 KB
v3-0005-Add-the-subscription-option-spill_compression.patch application/octet-stream 67.4 KB
v3-0001-Compress-ReorderBuffer-spill-files-using-LZ4.patch application/octet-stream 32.7 KB
v3-0006-Add-ReorderBuffer-ondisk-compression-tests.patch application/octet-stream 5.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Laurenz Albe 2024-07-19 22:07:36 Re: Incremental backup from a streaming replication standby fails
Previous Message David G. Johnston 2024-07-19 21:27:52 Re: behavior of GROUP BY with VOLATILE expressions