Re: Parallel CREATE INDEX for GIN indexes

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel CREATE INDEX for GIN indexes
Date: 2024-07-08 15:35:42
Message-ID: CAEze2Wim=EysPVMvdk9WAp0k4+eB5wpef+=N0WBvUC3hVBwrbw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 8 Jul 2024, 13:38 Tomas Vondra, <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
> On 7/8/24 11:45, Matthias van de Meent wrote:
> > As to the GIN TS code itself; yes it's more complicated, mainly
> > because it uses several optimizations to reduce unnecessary
> > allocations and (de)serializations of GinTuples, and I'm aware of even
> > more such optimizations that can be added at some point.
> >
> > As an example: I suspect the use of GinBuffer in writetup_index_gin to
> > be a significant resource drain in my patch because it lacks
> > "freezing" in the tuplesort buffer. When no duplicate key values are
> > encountered, the code is nearly optimal (except for a full tuple copy
> > to get the data into the GinBuffer's memory context), but if more than
> > one GinTuple has the same key in the merge phase we deserialize both
> > tuple's posting lists and merge the two. I suspect that merge to be
> > more expensive than operating on the compressed posting lists of the
> > GinTuples themselves, so that's something I think could be improved. I
> > suspect/guess it could save another 10% in select cases, and will
> > definitely reduce the memory footprint of the buffer.
> > Another thing that can be optimized is the current approach of
> > inserting data into the index: I think it's kind of wasteful to
> > decompress and later re-compress the posting lists once we start
> > storing the tuples on disk.
> >
>
> I need to experiment with this a bit more, to better understand the
> behavior and pros/cons. But one thing that's not clear to me is why
> would this be better than simply increasing the amount of memory for the
> initial BuildAccumulator buffer ...
>
> Wouldn't that have pretty much the same effect?

I don't think so:

The BuildAccumulator buffer will probably never be guaranteed to have
space for all index entries, though it does use memory more
efficiently than Tuplesort. Therefore, it will likely have to flush
keys multiple times into sorted runs, with likely duplicate keys
existing in the tuplesort.

My patch 0008 targets the reduction of IO and CPU once
BuildAccumulator has exceeded its memory limits. It reduces the IO and
computational requirement of Tuplesort's sorted-run merging by merging
the tuples in those sorted runs in that merge process, reducing the
number of tuples, bytes stored, and number of compares required in
later operations. It enables some BuildAccumulator-like behaviour
inside the tuplesort, without needing to assign huge amounts of memory
to the BuildAccumulator by allowing efficient spilling to disk of the
incomplete index data. And, during merging, it can work with
O(n_merge_tapes * tupsz) of memory, rather than O(n_distinct_keys *
tupsz). This doesn't make BuildAccumulator totally useless, but it
does reduce the relative impact of assigning more memory.

One significant difference between the modified Tuplesort and
BuildAccumulator is that the modified Tuplesort only merges the
entries once they're getting written, i.e. flushed from the in-memory
structure; while BuildAccumulator merges entries as they're being
added to the in-memory structure.

Note that this difference causes BuildAccumulator to use memory more
efficiently during in-memory workloads (it doesn't duplicate keys in
memory), but as BuildAccumulator doesn't have spilling it doesn't
handle full indexes' worth of data (it does duplciate keys on disk).

I hope this clarifies things a bit. I'd be thrilled if we'd be able to
put BuildAccumulator-like behaviour into the in-memory portion of
Tuplesort, but that'd require a significantly deeper understanding of
the Tuplesort internals than what I currently have, especially in the
area of its memory management.

Kind regards

Matthias van de Meent
Neon (https://neon.tech)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joe Conway 2024-07-08 15:38:54 Re: Commitfest manager for July 2024
Previous Message David E. Wheeler 2024-07-08 15:27:36 Re: ❓ JSON Path Dot Precedence