Re: Parallel CREATE INDEX for GIN indexes

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: Kirill Reshke <reshkekirill(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel CREATE INDEX for GIN indexes
Date: 2025-02-25 15:49:03
Message-ID: 08b57e98-2fd8-4372-bd1f-b15a010f171b@vondra.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

One more patch version / rebase. I've been planning to get 0001
committed, but I realized there's one more loose end - progress reporting.

I could have committed it without it, I guess, but Matthias actually
mentioned this a couple days ago so I took a stab at it. The build goes
through these 5 build stages (on top of "INITIALIZE"):

PROGRESS_GIN_PHASE_INDEXBUILD_TABLESCAN
PROGRESS_GIN_PHASE_PERFORMSORT_1
PROGRESS_GIN_PHASE_MERGE_1
PROGRESS_GIN_PHASE_PERFORMSORT_2
PROGRESS_GIN_PHASE_MERGE_2

The phases up to PROGRESS_GIN_PHASE_MERGE_1 happen in workers, i.e. it
ends with workers feeding the sorted/merged data into the shared
tuplesort. The last two phases are in the leader, which merges the data
and actually inserts it into the GIN index.

The "parallel" part has the blocks_done/blocks_total showing progress,
per the parallel scan. The "leader" phases use tuples_done/tuples_total,
where "tuple" is the GIN tuple produced by workers (each worker reports
the number of "tuples" it writes into the shared tuplesort, the leader
then tracks how many it processed).

I think this works pretty nicely. I'm not entirely sure we need all the
phases, maybe it'd be fine to have the sort+merge as a single phase? Or
maybe there should be one extra "sort" phase? Workers do two sorts,
first on their "private" tuplesort, then on the "shared" one.

What annoys me a little bit is that we only see those stages if the
leader participates as a worker. With parallel_leader_participation=off
none of this is visible anyway (we still see the blocks from the scan).

regards

--
Tomas Vondra

Attachment Content-Type Size
v20250225-0001-Allow-parallel-CREATE-INDEX-for-GIN-indexe.patch text/x-patch 68.8 KB
v20250225-0002-cleanup.patch text/x-patch 5.5 KB
v20250225-0003-progress.patch text/x-patch 8.3 KB
v20250225-0004-Compress-TID-lists-when-writing-GIN-tuples.patch text/x-patch 8.2 KB
v20250225-0005-Enforce-memory-limit-during-parallel-GIN-b.patch text/x-patch 12.3 KB
v20250225-0006-Use-a-single-GIN-tuplesort.patch text/x-patch 32.2 KB
v20250225-0007-WIP-parallel-inserts-into-GIN-index.patch text/x-patch 18.4 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ramanathan 2025-02-25 15:56:13 Re: Proposal - Reduce lock during first phase of VACUUM TRUNCATE from ACCESS EXCLUSIVE to EXCLUSIVE
Previous Message Magnus Hagander 2025-02-25 15:42:37 Re: Adding extension default version to \dx