From: | Michail Nikolaev <michail(dot)nikolaev(at)gmail(dot)com> |
---|---|
To: | alvherre(at)alvh(dot)no-ip(dot)org |
Cc: | Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andrey Borodin <amborodin86(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> |
Subject: | Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements |
Date: | 2025-02-04 01:38:00 |
Message-ID: | CANtu0ojiVez054rKvwZzKNhneS2R69UXLnw8N9EdwQwqfEoFdQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello, Alvaro!
I want to bring your attention to that patch because I think (and hope :P)
you might be interested in it since it all began with your work in 2021 [0].
That feature (ability to create\reindex indexes concurrently without
impacting vacuum horizon) made my life better :) Unfortunately, due to [1]
only for short period.
As you said in [3] :
> Deciding to revert makes me sad, because this feature is extremely
valuable for users
so, I highly agree with you here.
It is started with some ideas about the smaller patch scope but ended as:
• [CREATE|RE]INDEX CONCURRENTLY affects vacuum just for a few transactions
(snapshots are reset regularly)
• CI/RC is achieved in (almost) single heap scan (yes, about 3x-2x faster
in many cases)
• all core MVCC-related code is unaffected, everything is protected using
regular snapshots (as I remember Anders was against any changes into that
part)
• feature was actively tested for correctness - I have found five other
issues trying to find bugs in the patch (including [4] - bug is amcheck
itself, which I was using for testing indexes for correctness under the
stress, it was a tough story).
• benchmark shows great results (see attachments) and [2] and [5] and [6]
for more results, details and explanations
In a few words, it works like this:
• before building the index, an auxiliary index of the new empty STIR
(short-term index replacement) access method is created (for the same
columns, predicates, etc.). STIR in unlogged and only stores TIDs of new
coming tuples (datums are not even prepared for it during insert if
possible)
• during the first scan of heap, snapshot used for scan is being reset
every few pages, allowing xmin to propagate (in case of unique index we
also need some additional logic to handle correctness)
• instead of the second heap scan – we just check tids of target and
auxiliary indexes - and insert everything present in STIR but absent in the
target index (also with resetting snapshots every few pages during that)
• auxiliary STIR index then dropped (it also dropped in other cases to
avoid burden for DB administrators)
I have split the patch into 12 commits, some parts may be committed
separately. Some explanation about separation of patches may be found at
[7]. I have tried to structure them as much as possible (each improves some
small part of the whole set). Commit messages explain changes (I hope).
I may provide any additional details you may need – feel free to ask. Also,
I have some infrastructure for benchmarks and validation tests, so, you if
you want to check/test – I am happy to help.
I know it may feel like a naïve “miracle” patch from a dummy (2x index
building speedup without affecting horizon, aha) – but give it a chance.
Also, the last version of the patch in attach.
Best regards,
Mikhail.
[0]:
https://www.postgresql.org/message-id/20210115133858.GA18931@alvherre.pgsql
[1]:
https://www.postgresql.org/message-id/17485-396609c6925b982d%40postgresql.org
[2]:
https://discord.com/channels/1258108670710124574/1259884843165155471/1334565506149253150
[3]:
https://www.postgresql.org/message-id/flat/202205251643.2py5jjpaw7wy%40alvherre.pgsql#589508d30b480b905619dcb93dac8cf8
[4]:
https://www.postgresql.org/message-id/flat/CAH2-WzmcFDK2OzziTgdHxPTmaRQmSFLoDjS-C06uWGTsXibx9g%40mail.gmail.com#e4f4e414363944fc50c383c2a7dc5582
[5]:
https://www.postgresql.org/message-id/flat/CANtu0ojHAputNCH73TEYN_RUtjLGYsEyW1aSXmsXyvwf%3D3U4qQ%40mail.gmail.com#b18fb8efab086bc22af1cb015e187cb7
[6]:
https://www.postgresql.org/message-id/flat/CANtu0oi7d0_8oHpDPi_vFsuD0h71LNL4U2XXg0kq7iY_Ys3%2BSA%40mail.gmail.com#bc1c6f6718b6a7598c1f8e859399a889
[7]:
https://www.postgresql.org/message-id/flat/CANtu0og-4pvn4%2BTCWH6U9ghyd7x7NBAZSgi4ZWyBZdBWH6OpWA%40mail.gmail.com#1f000e705dc1968e8512d649c2923144
From | Date | Subject | |
---|---|---|---|
Next Message | wenhui qiu | 2025-02-04 01:41:51 | Re: New GUC autovacuum_max_threshold ? |
Previous Message | Zhang Mingli | 2025-02-04 01:31:06 | Re: Prevent COPY FREEZE on Foreign tables |