Quick Links

Re: Parallel CREATE INDEX for GIN indexes

From:	Tomas Vondra <tomas(at)vondra(dot)me>
To:	Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc:	Kirill Reshke <reshkekirill(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Parallel CREATE INDEX for GIN indexes
Date:	2025-02-12 17:52:50
Message-ID:	1b967857-ce2e-4045-bf78-d3a49c037536@vondra.me
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 2/12/25 15:59, Matthias van de Meent wrote:
> On Tue, 7 Jan 2025 at 12:59, Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>>
>> On 1/6/25 20:13, Matthias van de Meent wrote:
>>> ...
>>>>
>>>> Thanks. Attached is a rebased patch series fixing those issues, and one
>>>> issue I found in an AssertCheckGinBuffer, which was calling the other
>>>> assert (AssertCheckItemPointers) even for empty buffers. I think this
>>>> part might need some more work, so that it's clear what the various
>>>> asserts assume (or rather to allow just calling AssertCheckGinBuffer
>>>> everywhere, with some flags).
>>>
>>> Thanks for the rebase.
>>>
>>>> 0001
>>>> + * mutex protects all fields before heapdesc.
>>>
>>> This comment is still inaccurate.
>>>
>>
>> Hmm, yeah. But this comment originates from btree, so maybe it's wrong
>> there (and in BRIN too)? I believe it refers to the descriptors stored
>> after the struct, i.e. it means "all fields after the mutex".
>
> Yeah, I think that's just the comment that needs updating.
>
>>>> + /* FIXME likely duplicate with indtuples */
>>>
>>> I think this doesn't have to be duplicate, as we can distinguish
>>> between number of heap tuples and the number of GIN (key, TID) pairs
>>> loaded. This distinction doesn't really exist anywhere else, though,
>>> so to expose this to users we may need changes in
>>> pg_stat_progress_create_index.
>>>
>>> While I haven't checked if that distinction is being made in the code,
>>> I think it would be a useful distinction to have.
>>>
>>
>> I haven't done anything about this, but I'm not sure adding the number
>> of GIN tuples to pg_stat_progress_create_index would be very useful. We
>> don't know the total number of entries, so it can't show the progress.
>
> For btree scans, we update the number of to-be-inserted tuples
> together with the number of blocks scanned. Can we do something
> similar with GIN?
>
> Can we track data for pg_stat_progress_create_index?
>
>>>> GinBufferInit
>>>
>>> This seems to depend on the btree operator classes to get sortsupport
>>> functions, bypassing the GIN compare support function (support
>>> function 1) and adding a dependency on the btree opclasses for
>>> indexable types. This can cause "bad" ordering, or failure to build
>>> the index when the parallel path is chosen and no default btree
>>> opclass is defined for the type. I think it'd be better if we allowed
>>> users to specify which sortsupport function to use, or at least use
>>> the correct compare function when it's defined on the attribute's
>>> operator class.
>>>
>>
>> Good point! I fixed this by copying the logic from initGinState.
>>
>>>> include/access/gin_tuple.h
>>>> + OffsetNumber attrnum; /* attnum of index key */
>>>
>>> I think this would best be AttrNumber-typed? Looks like I didn't
>>> notice or fix that in 0009.
>>>
>>
>> You're probably right, but I see the GIN code uses OffsetNumber for
>> attrnum in a number of places. I wonder why is that. I don't think it
>> can be harmful, because we can't have GIN on system columns, right?
>
> Indeed, indexes on system columns are not supported, which includes GIN indexes.
>
>>>> I need to figure out how to squash the patches - I don't want to
>>>> squash this into a single much-harder-to-understand commit, but maybe it
>>>> has too many parts.
>
> I think the following would be good:
>
> Commits:
> 1.) 0001 (parallel create) + 0009 (reduce the size of ...) + 0002
> (mergesort) + 0003 (remove explicit pg_qsort) + 0007 (detect
> wrap-around)
> 2.) 0004 (compress) + 0006 (enforce memory limit)
> 3.) 0008 (single tuplesort)
>

Thanks. It's been so long since I looked at the patches that I don't
quite recall all the details - it's almost as if it was authored by
someone else ;-)

But your proposal makes sense. I was torn between committing this in
smaller "increments" and squashing it like this. The smaller steps are
easier to follow and debug. But that mattered during the development,
and evaluation of those changes. It's not that useful for commit,
because the parts will get pushed over a relatively short time period
anyway.

> Note that 0009 is a drop-in improvement, so I don't think order makes
> much of a difference there.
>

True.

> IIUC, 0005 was only for development insights, and not proposed to get
> committed. If that was wrong, I'd squash it into the second commit,
> together with 0004/0006.
>

Nope, it was for development only. I'll however consider keeping 0004
and 0006 separate, because those seem like pretty separate changes. I
don't see much point in not committing them in (1) and then squashing
them together into a single commit.

> I'll try to provide a more polished version of 0008 soon, with
> improved comments/commit message, however that'll depend on me not
> getting distracted with $job items first; it's taken quite some time
> recently.
>

Cool, thanks!

regards

--
Tomas Vondra

In response to

Re: Parallel CREATE INDEX for GIN indexes at 2025-02-12 14:59:08 from Matthias van de Meent

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Robert Haas	2025-02-12 18:00:22	Re: AIO v2.3
Previous Message	Tom Lane	2025-02-12 17:38:21	Re: Small memory fixes for pg_createsubcriber