Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Michail Nikolaev <michail(dot)nikolaev(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject: Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements
Date: 2023-12-15 21:11:59
Message-ID: CAEze2WgW6pj48xJhG_YLUE1QS+n9Yv0AZQwaWeb-r+X=HAxU_g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 15 Dec 2023, 20:07 Michail Nikolaev, <michail(dot)nikolaev(at)gmail(dot)com>
wrote:

> Hello, hackers!
>
> I think about revisiting (1) ({CREATE INDEX, REINDEX} CONCURRENTLY
> improvements) in some lighter way.
>
> Yes, a serious bug was (2) caused by this optimization and now it reverted.
>
> But what about a more safe idea in that direction:
> 1) add new horizon which ignores PROC_IN_SAFE_IC backends and standbys
> queries
> 2) use this horizon for settings LP_DEAD bit in indexes (excluding
> indexes being built of course)
>
> Index LP_DEAD hints are not used by standby in any way (they are just
> ignored), also heap scan done by index building does not use them as
> well.
>
> But, at the same time:
> 1) index scans will be much faster during index creation or standby
> reporting queries
> 2) indexes can keep them fit using different optimizations
> 3) less WAL due to a huge amount of full pages writes (which caused by
> tons of LP_DEAD in indexes)
>
> The patch seems more-less easy to implement.
> Does it worth being implemented? Or to scary?
>

I hihgly doubt this is worth the additional cognitive overhead of another
liveness state, and I think there might be other issues with marking index
tuples dead in indexes before the table tuple is dead that I can't think of
right now.

I've thought about alternative solutions, too: how about getting a new
snapshot every so often?
We don't really care about the liveness of the already-scanned data; the
snapshots used for RIC are used only during the scan. C/RIC's relation's
lock level means vacuum can't run to clean up dead line items, so as long
as we only swap the backend's reported snapshot (thus xmin) while the scan
is between pages we should be able to reduce the time C/RIC is the one
backend holding back cleanup of old tuples.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

> [1]: https://postgr.es/m/20210115133858.GA18931@alvherre.pgsql
> [2]: https://postgr.es/m/17485-396609c6925b982d%40postgresql.org
>
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeremy Schneider 2023-12-16 00:30:39 Re: Built-in CTYPE provider
Previous Message Michail Nikolaev 2023-12-15 19:07:29 Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements