From: | Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Parallel CREATE INDEX for BRIN indexes |
Date: | 2023-11-29 20:30:32 |
Message-ID: | CAEze2Wi9nWxx2wDTHaBJi36JLQ+-JcTU0BcRh+yd8AR-9VNxTw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, 29 Nov 2023 at 18:55, Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
> On 11/29/23 15:52, Tomas Vondra wrote:
> >> ...
> >>
> >> This also made me think a bit more about how we're working with the
> >> tuples. With your latest patch, we always deserialize and re-serialize
> >> the sorted brin tuples, just in case the next tuple will also be a
> >> BRIN tuple of the same page range. Could we save some of that
> >> deserialization time by optimistically expecting that we're not going
> >> to need to merge the tuple and only store a local copy of it locally?
> >> See attached 0002; this saves some cycles in common cases.
> >>
> >
> > Good idea!
> >
>
> FWIW there's a bug, in this part of the optimization:
>
> ------------------
> + if (memtuple == NULL)
> + memtuple = brin_deform_tuple(state->bs_bdesc, btup,
> + memtup_holder);
> +
> union_tuples(state->bs_bdesc, memtuple, btup);
> continue;
> ------------------
>
> The deforming should use prevbtup, otherwise union_tuples() jut combines
> two copies of the same tuple.
Good point. There were some more issues as well, fixes are attached.
> Which however brings me to the bigger issue with this - my stress test
> found this issue pretty quickly, but then I spent quite a bit of time
> trying to find what went wrong. I find this reworked code pretty hard to
> understand, and not necessarily because of how it's written. The problem
> is it the same loop tries to juggle multiple pieces of information with
> different lifespans, and so on. I find it really hard to reason about
> how it behaves ...
Yeah, it'd be nice if we had a peek option for sortsupport, that'd
improve context handling.
> I did try to measure how much it actually saves, but none of the tests I
> did actually found measurable improvement. So I'm tempted to just not
> include this part, and accept that we may deserialize some of the tuples
> unnecessarily.
>
> Did you actually observe measurable improvements in some cases?
The improvements would mostly stem from brin indexes with multiple
(potentially compressed) by-ref types, as they go through more complex
and expensive code to deserialize, requiring separate palloc() and
memcpy() calls each.
For single-column and by-value types the improvements are expected to
be negligible, because there is no meaningful difference between
copying a single by-ref value and copying its container; the
additional work done for each tuple is marginal for those.
For an 8-column BRIN index ((sha256((id)::text::bytea)::text),
(sha256((id+1)::text::bytea)::text),
(sha256((id+2)::text::bytea)::text), ...) instrumented with 0003 I
measured a difference of 10x less time spent in the main loop of
_brin_end_parallel, from ~30ms to 3ms when dealing with 55k 1-block
ranges. It's not a lot, but worth at least something, I guess?
The attached patch fixes the issue that you called out .
It also further updates _brin_end_parallel: the final 'write empty
tuples' loop is never hit and is thus removed, because if there were
any tuples in the spool we'd have filled the empty ranges at the end
of the main loop, and if there were no tuples in the spool then the
memtuple would still be at its original initialized value of 0 thus
resulting in a constant false condition. I also updated some comments.
Kind regards,
Matthias van de Meent
Neon (https://neon.tech)
Attachment | Content-Type | Size |
---|---|---|
v6-0003-NOCOMMIT-Instrumentation-for-time-spent-in-_brin_.patch | application/octet-stream | 2.7 KB |
v6-0002-Reduce-de-forming-of-BRIN-tuples-in-parallel-BRIN.patch | application/octet-stream | 7.6 KB |
v6-0001-Allow-BRIN-to-build-its-index-in-parallel.patch | application/octet-stream | 51.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tomas Vondra | 2023-11-29 20:56:18 | Re: Parallel CREATE INDEX for BRIN indexes |
Previous Message | Andrew Dunstan | 2023-11-29 20:24:01 | Re: remaining sql/json patches |