Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

From: Michail Nikolaev <michail(dot)nikolaev(at)gmail(dot)com>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andrey Borodin <amborodin86(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>
Subject: Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements
Date: 2024-09-01 21:19:00
Message-ID: CANtu0oh4PwBn_h+4p_MxFigRAyJvF-0nA9Tm5NFRwfsWWjZQiA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello, Matthias!

Just wanted to update you with some information about the next steps in
work.

> In heapam_index_build_range_scan, it seems like you're popping the
> snapshot and registering a new one while holding a tuple from
> heap_getnext(), thus while holding a page lock. I'm not so sure that's
> OK, expecially when catalogs are also involved (specifically for
> expression indexes, where functions could potentially be updated or
> dropped if we re-create the visibility snapshot)

I have returned to the solution with a dedicated catalog_xmin for backends
[1].
Additionally, I have added catalog_xmin to pg_stat_activity [2].

> In heapam_index_build_range_scan, you pop the snapshot before the
> returned heaptuple is processed and passed to the index-provided
> callback. I think that's incorrect, as it'll change the visibility of
> the returned tuple before it's passed to the index's callback. I think
> the snapshot manipulation is best added at the end of the loop, if we
> add it at all in that function.

Now it's fixed, and the snapshot is reset between pages [3].

Additionally, I resolved the issue with potential duplicates in unique
indexes. It looks a bit clunky, but it works for now [4].

Single commit from [5] also included, just for stable stress testing.

Full diff is available at [6].

Best regards,
Mikhail.

[1]:
https://github.com/michail-nikolaev/postgres/commit/01a47623571592c52c7a367f85b1cff9d8b593c0
[2]:
https://github.com/michail-nikolaev/postgres/commit/d3345d60bd51fe2e0e4a73806774b828f34ba7b6
[3]:
https://github.com/michail-nikolaev/postgres/commit/7d1dd4f971e8d03f38de95f82b730635ffe09aaf
[4]:
https://github.com/michail-nikolaev/postgres/commit/4ad56e14dd504d5530657069068c2bdf172e482d
[5]: https://commitfest.postgresql.org/49/5160/
[6]:
https://github.com/postgres/postgres/compare/master...michail-nikolaev:postgres:new_index_concurrently_approach?diff=split&w=

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2024-09-01 22:44:34 Re: Collect statistics about conflicts in logical replication
Previous Message Nathan Bossart 2024-09-01 21:05:21 Re: optimizing pg_upgrade's once-in-each-database steps