From: | Michail Nikolaev <michail(dot)nikolaev(at)gmail(dot)com> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andrey Borodin <amborodin86(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com> |
Subject: | Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements |
Date: | 2024-12-25 15:14:00 |
Message-ID: | CANtu0og-4pvn4+TCWH6U9ghyd7x7NBAZSgi4ZWyBZdBWH6OpWA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello, Michael!
Thank you for your comments and feedback!
Yes, this patch set contains a significant amount of code, which makes it
challenging to review. Some details are explained in the commit messages,
but I’m doing my best to structure the patch set in a way that is as
committable as possible. Once all the parts are ready, I plan to write a
detailed letter explaining everything, including benchmark results and
other relevant information.
Meanwhile, here’s a quick overview of the patch structure. If you have
suggestions for an alternative decomposition approach, I’d be happy to hear.
The primary goals of the patch set are to:
* Enable the xmin horizon to propagate freely during concurrent index
builds
* Build concurrent indexes with a single heap scan
The patch set is split into the following parts. Technically, each part
could be committed separately, but all of them are required to achieve the
goals.
Part 1: Stress tests
- 0001: Yes, this patch is from another thread and not directly required,
it’s included here as a single commit because it’s necessary for stress
testing this patch set. Without it, issues with concurrent reindexing and
upserts cause failures.
- 0002: Yes, I agree these tests need to be refactored or moved into a
separate task. I’ll address this later.
Part 2: During the first phase of concurrently building a index, reset the
snapshot used for heap scans between pages, allowing xmin to go forward.
- 0003: Implement such snapshot resetting for non-parallel and non-unique
cases
- 0004: Extends snapshot resetting to parallel builds
- 0005: Extends snapshot resetting to unique indexes
Part 3: Build concurrent indexes in a single heap scan
- 0006: Introduces the STIR (Short-Term Index Replacement) access method, a
specialized method for auxiliary indexes during concurrent builds
- 0007: Implements the auxiliary index approach, enabling concurrent index
builds to use a single heap scan.
In a few words, it works like this: create an empty auxiliary
STIR index to track new tuples, scan heap and build new index, merge STIR
tuples into new index, drop auxiliary index.
- 0008: Enhances the auxiliary index approach by resetting snapshots during
the merge phase, allowing xmin to propagate
Part 4: This part depends on all three previous parts being committed to
make sense (other parts are possible to apply separately).
- 0009: Remove PROC_IN_SAFE_IC logic, as it is no more required
I have a plan to add a few additional small things (optimizations) and then
do some scaled stress-testing and benchmarking. I think that without it, no
one is going to spend his time for such an amount of code :)
Merry Christmas,
Mikhail.
From | Date | Subject | |
---|---|---|---|
Next Message | Vladlen Popolitov | 2024-12-25 15:55:51 | Re: Windows UTF8 system locale |
Previous Message | vignesh C | 2024-12-25 14:37:26 | Documentation update of wal_retrieve_retry_interval to mention table sync worker |