Re: Parallel CREATE INDEX for GIN indexes

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel CREATE INDEX for GIN indexes
Date: 2024-07-08 09:45:55
Message-ID: CAEze2WiYtV1DHDi=dS25h+EB3vGX1Q6FoiBxPcU7YjXMKb2_3g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, 7 Jul 2024, 18:26 Tomas Vondra, <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
> On 7/5/24 21:45, Matthias van de Meent wrote:
>> On Wed, 3 Jul 2024 at 20:36, Matthias van de Meent
>> <boekewurm+postgres(at)gmail(dot)com> wrote:
>>>
>>> On Mon, 24 Jun 2024 at 02:58, Tomas Vondra
>>> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>>>> So I've switched this to use the regular data-type comparisons, with
>>>> SortSupport etc. There's a bit more cleanup remaining and testing
>>>> needed, but I'm not aware of any bugs.
>>> ---
>>>> +++ b/src/backend/utils/sort/tuplesortvariants.c
>>>
>>> I was thinking some more about merging tuples inside the tuplesort. I
>>> realized that this could be implemented by allowing buffering of tuple
>>> writes in writetup. This would require adding a flush operation at the
>>> end of mergeonerun to store the final unflushed tuple on the tape, but
>>> that shouldn't be too expensive. This buffering, when implemented
>>> through e.g. a GinBuffer in TuplesortPublic->arg, could allow us to
>>> merge the TID lists of same-valued GIN tuples while they're getting
>>> stored and re-sorted, thus reducing the temporary space usage of the
>>> tuplesort by some amount with limited overhead for other
>>> non-deduplicating tuplesorts.
>>>
>>> I've not yet spent the time to get this to work though, but I'm fairly
>>> sure it'd use less temporary space than the current approach with the
>>> 2 tuplesorts, and could have lower overall CPU overhead as well
>>> because the number of sortable items gets reduced much earlier in the
>>> process.
>>
>> I've now spent some time on this. Attached the original patchset, plus
>> 2 incremental patches, the first of which implement the above design
>> (patch no. 8).
>>
>> Local tests show it's significantly faster: for the below test case
>> I've seen reindex time reduced from 777455ms to 626217ms, or ~20%
>> improvement.
>>
>> After applying the 'reduce the size of GinTuple' patch, index creation
>> time is down to 551514ms, or about 29% faster total. This all was
>> tested with a fresh stock postgres configuration.
>>
>> """
>> CREATE UNLOGGED TABLE testdata
>> AS SELECT sha256(i::text::bytea)::text
>> FROM generate_series(1, 15000000) i;
>> CREATE INDEX ON testdata USING gin (sha256 gin_trgm_ops);
>> """
>>
>
> Those results look really promising. I certainly would not have expected
> such improvements - 20-30% speedup on top of the already parallel run
> seems great. I'll do a bit more testing to see how much this helps for
> the "more realistic" data sets.
>
> I can't say I 100% understand how the new stuff in tuplesortvariants.c
> works, but that's mostly because my knowledge of tuplesort internals is
> fairly limited (and I managed to survive without that until now).
>
> What makes me a bit unsure/uneasy is that this seems to "inject" custom
> code fairly deep into the tuplesort logic. I'm not quite sure if this is
> a good idea ...

I thought it was still fairly high-level: it adds (what seems to me) a
natural extension to the pre-existing "write a tuple to the tape" API,
allowing the Tuplesort (TS) implementation to further optimize its
ordered tape writes through buffering (and combining) of tuple writes.
While it does remove the current 1:1 relation of TS tape writes to
tape reads for the GIN case, there is AFAIK no code in TS that relies
on that 1:1 relation.

As to the GIN TS code itself; yes it's more complicated, mainly
because it uses several optimizations to reduce unnecessary
allocations and (de)serializations of GinTuples, and I'm aware of even
more such optimizations that can be added at some point.

As an example: I suspect the use of GinBuffer in writetup_index_gin to
be a significant resource drain in my patch because it lacks
"freezing" in the tuplesort buffer. When no duplicate key values are
encountered, the code is nearly optimal (except for a full tuple copy
to get the data into the GinBuffer's memory context), but if more than
one GinTuple has the same key in the merge phase we deserialize both
tuple's posting lists and merge the two. I suspect that merge to be
more expensive than operating on the compressed posting lists of the
GinTuples themselves, so that's something I think could be improved. I
suspect/guess it could save another 10% in select cases, and will
definitely reduce the memory footprint of the buffer.
Another thing that can be optimized is the current approach of
inserting data into the index: I think it's kind of wasteful to
decompress and later re-compress the posting lists once we start
storing the tuples on disk.

Kind regards,

Matthias van de Meent

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joel Jacobson 2024-07-08 09:55:28 Re: pgsql: Add pg_get_acl() to get the ACL for a database object
Previous Message Dean Rasheed 2024-07-08 09:45:22 Re: Incorrect results from numeric round() and trunc()