Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

From: Michail Nikolaev <michail(dot)nikolaev(at)gmail(dot)com>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: Melanie Plageman <melanieplageman(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andrey Borodin <amborodin86(at)gmail(dot)com>
Subject: Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements
Date: 2024-05-05 23:37:20
Message-ID: CANtu0og6P+O10XLm-AsoqOhZYEjr8SEHFETadSJ8ifO01YP1qg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello, Matthias!

I just realized there is a much simpler and safe way to deal with the
problem.

So, d9d076222f5b94a85e0e318339cfc44b8f26022d(1) had a bug because the scan
was not protected by a snapshot. At the same time, we want this snapshot to
affect not all the relations, but only a subset of them. And there is
already a proper way to achieve that - different types of visibility
horizons!

So, to resolve the issue, we just need to create a separated horizon value
for such situation as building an index concurrently.

For now, let's name it `VISHORIZON_BUILD_INDEX_CONCURRENTLY` for example.
By default, its value is equal to `VISHORIZON_DATA`. But in some cases it
"stops" propagating forward while concurrent index is building, like this:

h->create_index_concurrently_oldest_nonremovable
=TransactionIdOlder(h->create_index_concurrently_oldest_nonremovable, xmin);
if (!(statusFlags & PROC_IN_SAFE_IC))
h->data_oldest_nonremovable =
TransactionIdOlder(h->data_oldest_nonremovable, xmin);

The `PROC_IN_SAFE_IC` marks backend xmin as ignored by `VISHORIZON_DATA`
but not by `VISHORIZON_BUILD_INDEX_CONCURRENTLY`.

After, we need to use appropriate horizon for relations which are processed
by `PROC_IN_SAFE_IC` backends. There are a few ways to do it, we may start
prototyping with `rd_indexisbuilding` from previous message:

static inline GlobalVisHorizonKind
GlobalVisHorizonKindForRel(Relation rel)
........
if (rel != NULL && rel->rd_indexvalid &&
rel->rd_indexisbuilding)
return VISHORIZON_BUILD_INDEX_CONCURRENTLY;

There are few more moments need to be considered:

* Does it move the horizon backwards?

It is allowed for the horizon to move backwards (like said in
`ComputeXidHorizons`) but anyway - in that case the horizon for particular
relations just starts to lag behind the horizon for other relations.
Invariant is like that: `VISHORIZON_BUILD_INDEX_CONCURRENTLY` <=
`VISHORIZON_DATA` <= `VISHORIZON_CATALOG` <= `VISHORIZON_SHARED`.

* What is about old cached versions of `Relation` objects without
`rd_indexisbuilding` yet set?

This is not a problem because once the backend registers a new index, it
waits for all transactions without that knowledge to end
(`WaitForLockers`). So, new ones will also get information about new
horizon for that particular relation.

* What is about TOAST?
To keep TOAST horizon aligned with relation building the index, we may do
the next thing (as first implementation iteration):

else if (rel != NULL && ((rel->rd_indexvalid &&
rel->rd_indexisbuilding) || IsToastRelation(rel)))
return VISHORIZON_BUILD_INDEX_CONCURRENTLY;

For the normal case, `VISHORIZON_BUILD_INDEX_CONCURRENTLY` is equal to
`VISHORIZON_DATA` - nothing is changed at all. But while the concurrent
index is building, the TOAST horizon is guaranteed to be aligned with its
parent relation. And yes, it is better to find an easy way to affect only
TOAST relations related to the relation with index building in progress.

New horizon adds some complexity, but not too much, in my opinion. I am
pretty sure it is worth being done because the ability to rebuild indexes
without performance degradation is an extremely useful feature.
Things to be improved:
* better way to track relations with concurrent indexes being built (with
mechanics to understood what index build was failed)
* better way to affect TOAST tables only related to concurrent index build
* better naming

Patch prototype in attachment.
Also, maybe it is worth committing test separately - it was based on Andrey
Borodin work (2). The test fails well in the case of incorrect
implementation.

[1]:
https://github.com/postgres/postgres/commit/d9d076222f5b94a85e0e318339cfc44b8f26022d#diff-8879f0173be303070ab7931db7c757c96796d84402640b9e386a4150ed97b179R1779-R1793
[2]: https://github.com/x4m/postgres_g/commit/d0651e7d0d14862d5a4dac076355

Attachment Content-Type Size
v1-0001-WIP-fix-d9d076222f5b-VACUUM-ignore-indexing-opera.patch text/x-patch 17.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2024-05-06 01:03:37 Re: Weird test mixup
Previous Message Joe Conway 2024-05-05 23:22:46 Re: 'trusted'/'untrusted' PL in DoD/DISA PostgreSQL STIGs