Re: Parallel CREATE INDEX for BRIN indexes

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel CREATE INDEX for BRIN indexes
Date: 2023-11-30 00:10:48
Message-ID: c49dfd75-0a40-7feb-1018-54740e9a91db@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/29/23 23:59, Matthias van de Meent wrote:
> On Wed, 29 Nov 2023 at 21:56, Tomas Vondra
> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>>
>> On 11/29/23 21:30, Matthias van de Meent wrote:
>>> On Wed, 29 Nov 2023 at 18:55, Tomas Vondra
>>> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>>>> I did try to measure how much it actually saves, but none of the tests I
>>>> did actually found measurable improvement. So I'm tempted to just not
>>>> include this part, and accept that we may deserialize some of the tuples
>>>> unnecessarily.
>>>>
>>>> Did you actually observe measurable improvements in some cases?
>>>
>>> The improvements would mostly stem from brin indexes with multiple
>>> (potentially compressed) by-ref types, as they go through more complex
>>> and expensive code to deserialize, requiring separate palloc() and
>>> memcpy() calls each.
>>> For single-column and by-value types the improvements are expected to
>>> be negligible, because there is no meaningful difference between
>>> copying a single by-ref value and copying its container; the
>>> additional work done for each tuple is marginal for those.
>>>
>>> For an 8-column BRIN index ((sha256((id)::text::bytea)::text),
>>> (sha256((id+1)::text::bytea)::text),
>>> (sha256((id+2)::text::bytea)::text), ...) instrumented with 0003 I
>>> measured a difference of 10x less time spent in the main loop of
>>> _brin_end_parallel, from ~30ms to 3ms when dealing with 55k 1-block
>>> ranges. It's not a lot, but worth at least something, I guess?
>>>
>>
>> It is something, but I can't really convince myself it's worth the extra
>> code complexity. It's a somewhat extreme example, and the parallelism
>> certainly saves much more than this.
>
> True. For this, I usually keep in mind that the docs on multi-column
> indexes still indicate to use 1 N-column brin index over N 1-column
> brin indexes (assuming the same storage parameters), so multi-column
> BRIN indexes should not be considered to be uncommon:
>
> "The only reason to have multiple BRIN indexes instead of one
> multicolumn BRIN index on a single table is to have a different
> pages_per_range storage parameter."
>
> Note that most of the time in my example index is spent in creating
> the actual tuples due to the use of hashing for data generation; for
> index or plain to-text formatting the improvement is much more
> pronounced: If I use an 8-column index (id::text, id, ...), index
> creation takes ~500ms with 4+ workers. Of this, deforming takes some
> 20ms, though when skipping the deforming step (i.e.,with my patch) it
> takes ~3.5ms. That's a 3% shaved off the build time when the index
> shape is beneficial.
>

That's all true, and while 3.5% is not something to ignore, my POV is
that the parallelism speeds this up from ~2000ms to ~500ms. Yes, it
would be great to shave off the extra 1% (relative to the original
duration). But I don't have a great idea how to do code that in a way
that is readable, and I don't want to stall the patch indefinitely
because of a comparatively small improvement.

Therefore I propose we get the simpler code committed and leave this as
a future improvement.

>>> The attached patch fixes the issue that you called out .
>>> It also further updates _brin_end_parallel: the final 'write empty
>>> tuples' loop is never hit and is thus removed, because if there were
>>> any tuples in the spool we'd have filled the empty ranges at the end
>>> of the main loop, and if there were no tuples in the spool then the
>>> memtuple would still be at its original initialized value of 0 thus
>>> resulting in a constant false condition. I also updated some comments.
>>>
>>
>> Ah, right. I'll take a look tomorrow, but I guess I didn't realize we
>> insert the empty ranges in the main loop, because we're already looking
>> at the *next* summary.
>
> Yes, merging adds some significant complexity here. I don't think we
> can easily get around that though...
>
>> But I think the idea was to insert empty ranges if there's a chunk of
>> empty ranges at the end of the table, after the last tuple the index
>> build reads. But I'm not sure that can actually happen ...
>
> This would be trivial to construct with partial indexes; e.g. WHERE
> (my_pk IS NULL) would consist of exclusively empty ranges.
> I don't see a lot of value in partial BRIN indexes, but I may be
> overlooking something.
>

Oh, I haven't even thought about partial BRIN indexes! I'm sure for
those it's even more important to actually fill-in the empty ranges,
otherwise we end up scanning the whole supposedly filtered-out part of
the table. I'll do some testing with that.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2023-11-30 00:23:22 Re: encoding affects ICU regex character classification
Previous Message Thomas Munro 2023-11-30 00:01:46 Re: Streaming I/O, vectored I/O (WIP)