Re: Parallel CREATE INDEX for GIN indexes

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Parallel CREATE INDEX for GIN indexes
Date: 2024-07-07 16:10:49
Message-ID: 90240daa-a1d7-470c-a8ba-6c5343b8bd60@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 7/3/24 20:36, Matthias van de Meent wrote:
> On Mon, 24 Jun 2024 at 02:58, Tomas Vondra
> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>>
>> Here's a bit more cleaned up version, clarifying a lot of comments,
>> removing a bunch of obsolete comments, or comments speculating about
>> possible solutions, that sort of thing. I've also removed couple more
>> XXX comments, etc.
>>
>> The main change however is that the sorting no longer relies on memcmp()
>> to compare the values. I did that because it was enough for the initial
>> WIP patches, and it worked till now - but the comments explained this
>> may not be a good idea if the data type allows the same value to have
>> multiple binary representations, or something like that.
>>
>> I don't have a practical example to show an issue, but I guess if using
>> memcmp() was safe we'd be doing it in a bunch of places already, and
>> AFAIK we're not. And even if it happened to be OK, this is a probably
>> not the place where to start doing it.
>
> I think one such example would be the values '5.00'::jsonb and
> '5'::jsonb when indexed using GIN's jsonb_ops, though I'm not sure if
> they're treated as having the same value inside the opclass' ordering.
>

Yeah, possibly. But doing the comparison the "proper" way seems to be
working pretty well, so I don't plan to investigate this further.

>> So I've switched this to use the regular data-type comparisons, with
>> SortSupport etc. There's a bit more cleanup remaining and testing
>> needed, but I'm not aware of any bugs.
>
> A review of patch 0001:
>
> ---
>
>> src/backend/access/gin/gininsert.c | 1449 +++++++++++++++++++-
>
> The nbtree code has `nbtsort.c` for its sort- and (parallel) build
> state handling, which is exclusively used during index creation. As
> the changes here seem to be largely related to bulk insertion, how
> much effort would it be to split the bulk insertion code path into a
> separate file?
>

Hmmm. I haven't tried doing that, but I guess it's doable. I assume we'd
want to do the move first, because it involves pre-existing code, and
then do the patch on top of that.

But what would be the benefit of doing that? I'm not sure doing it just
to make it look more like btree code is really worth it. Do you expect
the result to be clearer?

> I noticed that new fields in GinBuildState do get to have a
> bs_*-prefix, but none of the other added or previous fields of the
> modified structs in gininsert.c have such prefixes. Could this be
> unified?
>

Yeah, these are inconsistencies from copying the infrastructure code to
make the parallel builds work, etc.

>> +/* Magic numbers for parallel state sharing */
>> +#define PARALLEL_KEY_GIN_SHARED UINT64CONST(0xB000000000000001)
>> ...
>
> These overlap with BRIN's keys; can we make them unique while we're at it?
>

We could, and I recall we had a similar discussion in the parallel BRIN
thread, right?. But I'm somewhat unsure why would we actually want/need
these keys to be unique. Surely, we don't need to mix those keys in the
single shm segment, right? So it seems more like an aesthetic thing. Or
is there some policy to have unique values for these keys?

>> + * mutex protects all fields before heapdesc.
>
> I can't find the field that this `heapdesc` might refer to.
>

Yeah, likely a leftover from copying a bunch of code and then removing
it without updating the comment. Will fix.

>> +_gin_begin_parallel(GinBuildState *buildstate, Relation heap, Relation index,
>> ...
>> + if (!isconcurrent)
>> + snapshot = SnapshotAny;
>> + else
>> + snapshot = RegisterSnapshot(GetTransactionSnapshot());
>
> grumble: I know this is required from the index with the current APIs,
> but I'm kind of annoyed that each index AM has to construct the table
> scan and snapshot in their own code. I mean, this shouldn't be
> meaningfully different across AMs, so every AM implementing this same
> code makes me feel like we've got the wrong abstraction.
>
> I'm not asking you to change this, but it's one more case where I'm
> annoyed by the state of the system, but not quite enough yet to change
> it.
>

Yeah, it's not great, but not something I intend to rework.

> ---
>> +++ b/src/backend/utils/sort/tuplesortvariants.c
>
> I was thinking some more about merging tuples inside the tuplesort. I
> realized that this could be implemented by allowing buffering of tuple
> writes in writetup. This would require adding a flush operation at the
> end of mergeonerun to store the final unflushed tuple on the tape, but
> that shouldn't be too expensive. This buffering, when implemented
> through e.g. a GinBuffer in TuplesortPublic->arg, could allow us to
> merge the TID lists of same-valued GIN tuples while they're getting
> stored and re-sorted, thus reducing the temporary space usage of the
> tuplesort by some amount with limited overhead for other
> non-deduplicating tuplesorts.
>
> I've not yet spent the time to get this to work though, but I'm fairly
> sure it'd use less temporary space than the current approach with the
> 2 tuplesorts, and could have lower overall CPU overhead as well
> because the number of sortable items gets reduced much earlier in the
> process.
>

Will respond to your later message about this.

> ---
>
>> +++ b/src/include/access/gin_tuple.h
>> + typedef struct GinTuple
>
> I think this needs some more care: currently, each GinTuple is at
> least 36 bytes in size on 64-bit systems. By using int instead of Size
> (no normal indexable tuple can be larger than MaxAllocSize), and
> packing the fields better we can shave off 10 bytes; or 12 bytes if
> GinTuple.keylen is further adjusted to (u)int16: a key needs to fit on
> a page, so we can probably safely assume that the key size fits in
> (u)int16.
>

Yeah, I guess using int64 is a bit excessive - you're right about that.
I'm not sure this is necessarily about "indexable tuples" (GinTuple is
not indexed, it's more an intermediate representation). But if we can
make it smaller, that probably can't hurt.

I don't have a great intuition on how beneficial this might be. For
cases with many TIDs per index key, it probably won't matter much. But
if there's many keys (so that GinTuples store only very few TIDs), it
might make a difference.

I'll try to measure the impact on the same "realistic" cases I used for
the earlier steps.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2024-07-07 16:26:03 Re: Parallel CREATE INDEX for GIN indexes
Previous Message Fujii Masao 2024-07-07 16:03:42 pg_maintain and USAGE privilege on schema