From: | Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> |
---|---|
To: | |
Cc: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Parallel CREATE INDEX for GIN indexes |
Date: | 2024-07-05 19:45:31 |
Message-ID: | CAEze2WiTAeZe4t5wAeRN834xFBqROPmjeK2XTstNko6bbVPX=A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, 3 Jul 2024 at 20:36, Matthias van de Meent
<boekewurm+postgres(at)gmail(dot)com> wrote:
>
> On Mon, 24 Jun 2024 at 02:58, Tomas Vondra
> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
> > So I've switched this to use the regular data-type comparisons, with
> > SortSupport etc. There's a bit more cleanup remaining and testing
> > needed, but I'm not aware of any bugs.
I've hit assertion failures in my testing of the combined patches, in
AssertCheckItemPointers: it assumes it's never called when the buffer
is empty and uninitialized, but that's wrong: we don't initialize the
items array until the first tuple, which will cause the assertion to
fire. By updating the first 2 assertions of AssertCheckItemPointers, I
could get it working.
> ---
> > +++ b/src/backend/utils/sort/tuplesortvariants.c
>
> I was thinking some more about merging tuples inside the tuplesort. I
> realized that this could be implemented by allowing buffering of tuple
> writes in writetup. This would require adding a flush operation at the
> end of mergeonerun to store the final unflushed tuple on the tape, but
> that shouldn't be too expensive. This buffering, when implemented
> through e.g. a GinBuffer in TuplesortPublic->arg, could allow us to
> merge the TID lists of same-valued GIN tuples while they're getting
> stored and re-sorted, thus reducing the temporary space usage of the
> tuplesort by some amount with limited overhead for other
> non-deduplicating tuplesorts.
>
> I've not yet spent the time to get this to work though, but I'm fairly
> sure it'd use less temporary space than the current approach with the
> 2 tuplesorts, and could have lower overall CPU overhead as well
> because the number of sortable items gets reduced much earlier in the
> process.
I've now spent some time on this. Attached the original patchset, plus
2 incremental patches, the first of which implement the above design
(patch no. 8).
Local tests show it's significantly faster: for the below test case
I've seen reindex time reduced from 777455ms to 626217ms, or ~20%
improvement.
After applying the 'reduce the size of GinTuple' patch, index creation
time is down to 551514ms, or about 29% faster total. This all was
tested with a fresh stock postgres configuration.
"""
CREATE UNLOGGED TABLE testdata
AS SELECT sha256(i::text::bytea)::text
FROM generate_series(1, 15000000) i;
CREATE INDEX ON testdata USING gin (sha256 gin_trgm_ops);
"""
Kind regards,
Matthias van de Meent
Neon (https://neon.tech)
Attachment | Content-Type | Size |
---|---|---|
v20240705-0003-Remove-the-explicit-pg_qsort-in-workers.patch | application/x-patch | 10.2 KB |
v20240705-0004-Compress-TID-lists-before-writing-tuples-t.patch | application/x-patch | 7.9 KB |
v20240705-0005-Collect-and-print-compression-stats.patch | application/x-patch | 5.7 KB |
v20240705-0002-Use-mergesort-in-the-leader-process.patch | application/x-patch | 12.6 KB |
v20240705-0001-Allow-parallel-create-for-GIN-indexes.patch | application/x-patch | 61.7 KB |
v20240705-0006-Enforce-memory-limit-when-combining-tuples.patch | application/x-patch | 14.0 KB |
v20240705-0009-Reduce-the-size-of-GinTuple-by-12-bytes.patch | application/x-patch | 5.5 KB |
v20240705-0007-Detect-wrap-around-in-parallel-callback.patch | application/x-patch | 8.1 KB |
v20240705-0008-Use-a-single-GIN-tuplesort.patch | application/x-patch | 32.8 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Nathan Bossart | 2024-07-05 19:50:39 | Re: remove check hooks for GUCs that contribute to MaxBackends |
Previous Message | Marat Bukharov | 2024-07-05 19:43:56 | Re: [PATCH] Add min/max aggregate functions to BYTEA |