Re: Parallel CREATE INDEX for GIN indexes

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Kirill Reshke <reshkekirill(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel CREATE INDEX for GIN indexes
Date: 2025-01-04 16:58:36
Message-ID: 6d4038c0-c8ea-4742-b0d7-58fea26a51c1@vondra.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/24/24 19:04, Kirill Reshke wrote:
> On Tue, 8 Oct 2024 at 17:06, Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>>
>> On 10/8/24 04:03, Michael Paquier wrote:
>>>
>>> _gin_parallel_build_main() is introduced in 0001. Please make sure to
>>> pass down a query ID.
>>
>> Thanks for the ping. Here's an updated patch doing that, and also fixing
>> a couple whitespace issues. No other changes, but I plan to get back to
>> this patch soon - before the next CF.
>>
>>
>> regards
>>
>> --
>> Tomas Vondra
>
> Hi! I was looking through this series of patches because thread of
> GIN&GIST amcheck patch references it.
>
> I have spotted this in gininsert.c:
> 1)
>> /*
>> * Store shared tuplesort-private state, for which we reserved space.
>> * Then, initialize opaque state using tuplesort routine.
>> */
>> sharedsort = (Sharedsort *) shm_toc_allocate(pcxt->toc, estsort);
>> tuplesort_initialize_shared(sharedsort, scantuplesortstates,>
> pcxt->seg);
>> /*
>> * Store shared tuplesort-private state, for which we reserved space.
>> * Then, initialize opaque state using tuplesort routine.
>> */
>
> Is it necessary to duplicate the entire comment?
>

Yes, that's a copy-paste mistake. Removed the second comment.

> And, while we are here, isn't it " initialize the opaque state "?
>

Not sure, this is copy pasted as-is from nbtree code.

> 2) typo :
> * the TID array, and returning false if it's too large (more thant work_mem,
>

Fixed.

> 3) in _gin_build_tuple:
>
> ....
> else if (typlen == -2)
> keylen = strlen(DatumGetPointer(key)) + 1;
> else
> elog(ERROR, "invalid typlen");
>
>
> Maybe `elog(ERROR, "invalid typLen: %d", typLen); ` as in `datumGetSize`?
>

Makes sense, I reworded it a little bit. But it's however supposed to be
a can't happen condition.

> 4) in _gin_compare_tuples:
>
>> if ((a->category == GIN_CAT_NORM_KEY) &&
> (b->category == GIN_CAT_NORM_KEY))
>
> maybe just a->category == GIN_CAT_NORM_KEY? a->category is already
> equal to b->category because of previous if statements.
>

True. I've simplified the condition.

> 5) In _gin_partition_sorted_data:
>
>> char fname[128];
>> sprintf(fname, "worker-%d", i);
>
> Other places use MAXPGPATH in similar cases.
>

OK, fixed the two places that format worker-%d.

> Also, code `sprintf(fname, "worker-%d",...);` duplicates. This might
> be error-prone. Should we have a macro/inline function for this?
>

Maybe. I think using a constant might be a good idea, but anything more
complicated is not worth it. There's only two places using it, not very
far apart.

> I will take another look later, maybe reporting real problems, not nit-picks.
>

Thanks. Attached is a rebased patch series fixing those issues, and one
issue I found in an AssertCheckGinBuffer, which was calling the other
assert (AssertCheckItemPointers) even for empty buffers. I think this
part might need some more work, so that it's clear what the various
asserts assume (or rather to allow just calling AssertCheckGinBuffer
everywhere, with some flags).

I still need to go through the comments / question by Matthias and Andy
Fan, that I missed when they posted them in August.

My plan is to eventually commit the first couple patches, possibly up
0007 or even 0009. The rest would be left as an improvement for the
future. I need to figure out how to squash the patches - I don't want to
squash this into a single much-harder-to-understand commit, but maybe it
has too many parts.

regards

--
Tomas Vondra

Attachment Content-Type Size
v20250104-0001-Allow-parallel-create-for-GIN-indexes.patch text/x-patch 61.9 KB
v20250104-0002-Use-mergesort-in-the-leader-process.patch text/x-patch 12.6 KB
v20250104-0003-Remove-the-explicit-pg_qsort-in-workers.patch text/x-patch 10.6 KB
v20250104-0004-Compress-TID-lists-before-writing-tuples-t.patch text/x-patch 7.9 KB
v20250104-0005-Collect-and-print-compression-stats.patch text/x-patch 5.7 KB
v20250104-0006-Enforce-memory-limit-when-combining-tuples.patch text/x-patch 14.0 KB
v20250104-0007-Detect-wrap-around-in-parallel-callback.patch text/x-patch 8.1 KB
v20250104-0008-Use-a-single-GIN-tuplesort.patch text/x-patch 33.1 KB
v20250104-0009-Reduce-the-size-of-GinTuple-by-12-bytes.patch text/x-patch 5.5 KB
v20250104-0010-WIP-parallel-inserts-into-GIN-index.patch text/x-patch 18.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey M. Borodin 2025-01-04 18:15:27 Re: Sort functions with specialized comparators
Previous Message Michel Pelletier 2025-01-04 16:37:39 Re: Using Expanded Objects other than Arrays from plpgsql