Quick Links

Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

From:	Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To:	Michail Nikolaev <michail(dot)nikolaev(at)gmail(dot)com>
Cc:	Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andrey Borodin <amborodin86(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>
Subject:	Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements
Date:	2025-01-04 01:12:58
Message-ID:	CAEze2Wj1sY+A6zKJtWhdXSmA9wjmP8jiazHgyH1O37NnSjH5SQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, 1 Jan 2025 at 17:17, Michail Nikolaev
<michail(dot)nikolaev(at)gmail(dot)com> wrote:
>
> Hello, everyone!
>
> I’ve added several updates to the patch set:
>
> * Automatic auxiliary index removal where applicable.
> * Documentation updates to reflect recent changes.
> * Optimization for STIR indexes: skipping datum setup, as they store only TIDs.
> * Numerous assertions to ensure that MyProc->xmin is invalid where necessary.
>
> I’d like to share some initial benchmark results (see attached graphs).
> This involves building a B-tree index on (aid, abalance) in a pgbench setup with scale 2000 (with WAL), while running a concurrent pgbench workload.
>
> The patched version built the index in 68 seconds, compared to 117 seconds with the master branch (mostly because of a single heap scan).
> There appears to be no effect on the throughput of the concurrent pgbench.
> The maximum snapshot age remains near zero.

Thank you for continuing working on this, these are some nice results.
I'm sorry I can't spend the time I want on this every time, but I
still think it's important this can eventually get committed, so thank
you for your work.

> (mostly because of a single heap scan).

Isn't there a second heap scan, or do you consider that an index scan?

> I am going to continue to benchmark with different options: different HOT setup, unique index, different index types and DB size (100+ GB).
> If someone has some ideas about possible benchmark scenarios - please share.

I think a good benchmark could show how bloat is actually prevented,
i.e. through result table size comparisons on an update-heavy
workload, both with and without the patch.
I think it shouldn't be too difficult to show how such workloads
quickly regress to always extending the table as no cleanup can
happen, while patched they'd have much more leeway due to page
pruning. Presumably a table with a fillfactor <100 would show the best
results.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

In response to

Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements at 2025-01-01 16:16:00 from Michail Nikolaev

Responses

Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements at 2025-01-06 13:36:00 from Michail Nikolaev

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	John Naylor	2025-01-04 02:24:02	Re: Incorrect CHUNKHDRSZ in nodeAgg.c
Previous Message	Tom Lane	2025-01-04 01:01:12	Re: Fwd: Re: A new look at old NFS readdir() problems?