Re: Parallel CREATE INDEX for GIN indexes

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Parallel CREATE INDEX for GIN indexes
Date: 2024-07-03 18:36:26
Message-ID: CAEze2WjB1vpxtvKuWVEThSaB-v4+8H0EXsOB=yLAv8pLcrQuKw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 24 Jun 2024 at 02:58, Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
> Here's a bit more cleaned up version, clarifying a lot of comments,
> removing a bunch of obsolete comments, or comments speculating about
> possible solutions, that sort of thing. I've also removed couple more
> XXX comments, etc.
>
> The main change however is that the sorting no longer relies on memcmp()
> to compare the values. I did that because it was enough for the initial
> WIP patches, and it worked till now - but the comments explained this
> may not be a good idea if the data type allows the same value to have
> multiple binary representations, or something like that.
>
> I don't have a practical example to show an issue, but I guess if using
> memcmp() was safe we'd be doing it in a bunch of places already, and
> AFAIK we're not. And even if it happened to be OK, this is a probably
> not the place where to start doing it.

I think one such example would be the values '5.00'::jsonb and
'5'::jsonb when indexed using GIN's jsonb_ops, though I'm not sure if
they're treated as having the same value inside the opclass' ordering.

> So I've switched this to use the regular data-type comparisons, with
> SortSupport etc. There's a bit more cleanup remaining and testing
> needed, but I'm not aware of any bugs.

A review of patch 0001:

---

> src/backend/access/gin/gininsert.c | 1449 +++++++++++++++++++-

The nbtree code has `nbtsort.c` for its sort- and (parallel) build
state handling, which is exclusively used during index creation. As
the changes here seem to be largely related to bulk insertion, how
much effort would it be to split the bulk insertion code path into a
separate file?

I noticed that new fields in GinBuildState do get to have a
bs_*-prefix, but none of the other added or previous fields of the
modified structs in gininsert.c have such prefixes. Could this be
unified?

> +/* Magic numbers for parallel state sharing */
> +#define PARALLEL_KEY_GIN_SHARED UINT64CONST(0xB000000000000001)
> ...

These overlap with BRIN's keys; can we make them unique while we're at it?

> + * mutex protects all fields before heapdesc.

I can't find the field that this `heapdesc` might refer to.

> +_gin_begin_parallel(GinBuildState *buildstate, Relation heap, Relation index,
> ...
> + if (!isconcurrent)
> + snapshot = SnapshotAny;
> + else
> + snapshot = RegisterSnapshot(GetTransactionSnapshot());

grumble: I know this is required from the index with the current APIs,
but I'm kind of annoyed that each index AM has to construct the table
scan and snapshot in their own code. I mean, this shouldn't be
meaningfully different across AMs, so every AM implementing this same
code makes me feel like we've got the wrong abstraction.

I'm not asking you to change this, but it's one more case where I'm
annoyed by the state of the system, but not quite enough yet to change
it.

---
> +++ b/src/backend/utils/sort/tuplesortvariants.c

I was thinking some more about merging tuples inside the tuplesort. I
realized that this could be implemented by allowing buffering of tuple
writes in writetup. This would require adding a flush operation at the
end of mergeonerun to store the final unflushed tuple on the tape, but
that shouldn't be too expensive. This buffering, when implemented
through e.g. a GinBuffer in TuplesortPublic->arg, could allow us to
merge the TID lists of same-valued GIN tuples while they're getting
stored and re-sorted, thus reducing the temporary space usage of the
tuplesort by some amount with limited overhead for other
non-deduplicating tuplesorts.

I've not yet spent the time to get this to work though, but I'm fairly
sure it'd use less temporary space than the current approach with the
2 tuplesorts, and could have lower overall CPU overhead as well
because the number of sortable items gets reduced much earlier in the
process.

---

> +++ b/src/include/access/gin_tuple.h
> + typedef struct GinTuple

I think this needs some more care: currently, each GinTuple is at
least 36 bytes in size on 64-bit systems. By using int instead of Size
(no normal indexable tuple can be larger than MaxAllocSize), and
packing the fields better we can shave off 10 bytes; or 12 bytes if
GinTuple.keylen is further adjusted to (u)int16: a key needs to fit on
a page, so we can probably safely assume that the key size fits in
(u)int16.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Harjyot Bagga 2024-07-03 18:55:07 Grammar guidelines in Postgres
Previous Message Tomas Vondra 2024-07-03 18:18:06 Re: Recommended books for admin