Re: Parallel CREATE INDEX for GIN indexes

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Tomas Vondra <tomas(at)vondra(dot)me>
Cc: Kirill Reshke <reshkekirill(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel CREATE INDEX for GIN indexes
Date: 2025-02-17 13:01:09
Message-ID: CAEze2WhzcMYbinvExaq=-L6G+NsyLMPz7uYxkMYg1C9wDtDFyQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, 16 Feb 2025 at 04:47, Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>
> Hi,
>
> Attached is a cleaned up version of the patch series, squashed into
> fewer patches as discussed. I also went through all the comments, and
> removed/updated some obsolete ones. I also updated the commit messages,
> it'd be nice if someone could read through those, to make sure it's
> clear enough.
>
> While cleaning the comments, I realized there's a couple remaining XXX
> and FIXME comments, with some valid open questions.
>
> 1) There are two places that explicitly zero memory, suggesting it's
> because of padding causing issues in valgrind (in tuplesort). I need to
> check if that's still true, but I wonder what do the other tuplesort
> variants write stuff without tripping valgrind. Maybe the GinTuple is
> too unique.

> 2) ginBuildCallbackParallel says this about memory limits:
>
> * XXX It might seem this should set the memory limit to 32MB, same as
> * what plan_create_index_workers() uses to calculate the number of
> * parallel workers, but that's the limit for tuplesort. So it seems
> * better to keep using work_mem here.
> *
> * XXX But maybe we should calculate this as a per-worker fraction of
> * maintenance_work_mem. It's weird to use work_mem here, in a clearly
> * maintenance command.
>
> The function uses work_mem to limit the amount of memory used by each
> worker, which seems a bit strange - it's a maintenance operation, so it
> would be more appropriate to use maintenance_work_mem I guess.
>
> I see the btree code also uses work_mem in some cases when building the
> index, although that uses it to size the tuplesort. And here we have
> both the tuplesorts (sized just like in nbtree code), but also the
> buffer used to accumulate entries.

I think that's a bug in btree code.

> I wonder if maybe the right solution would be to use half the allowance
> for tuplesort and half for the buffer. In the workers the allowance is
>
> maintenance_work_mem / ginleader->nparticipanttuplesorts
>
> while in the leader it's maintenance_work_mem. Opinions?

Why is the allowance in the leader not affected by memory usage of
parallel workers? Shouldn't that also be m_w_m / nparticipants?

IIRC, in nbtree, the leader will use (n_planned_part -
n_launched_part) * (m_w_m / n_planned_part), which in practice is 1 *
(m_w_m / n_planned_part).

> 3) There's a XXX comment suggesting to use a separate memory context for
> the GinBuffer, but I decided it doesn't seem really necessary. We're not
> running any complex function or anything like that in this code, so I
> don't see a huge benefit of a separate context.
>
> I know the patch reworking this to use a single tuplesort actually adds
> the memory context, maybe it's helpful for that patch. But for now I
> don't see the point.

I think I added that for the reduced cost of memory cleanup using mctx
resets vs repeated pfree(), when we're in the tuplesort merge phase.

> 4) The patch saving 12B in the GinTuple also added this comment:
>
> * XXX: Update description with new architecture
>
> but I'm a bit unsure what exactly is meant but "architecture" or what
> should I add a description for.

I worked on both 'smaller GinTuple' and the 'single tuplesort' patch
in one go, without any intermediate commits to delineate work on
either patch, and picked the changes for 'smaller GinTuple' from that
large pile of changes. That XXX was supposed to go into the second
patch, and was there to signal me to update the overarching
architecture documentation for Parallel GIN index builds (which I
subsequently forgot to do with the 'single tuplesort' patch). So, it's
not relevant to the 12B patch, but could be relevant to what is now
patch 0004.

Kind regards,

Matthias van de Meent

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sébastien 2025-02-17 13:47:43 Re: [Feature Request] INSERT FROZEN to Optimize Large Cold Data Imports and Migrations
Previous Message Álvaro Herrera 2025-02-17 13:00:30 Re: Elimination of the repetitive code at the SLRU bootstrap functions.