Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data
Date: 2021-07-31 08:37:50
Message-ID: D22DEA09-80DA-4350-839B-0FC0BD0668A4@yandex-team.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

> 30 июля 2021 г., в 23:41, Noah Misch <noah(at)leadboat(dot)com> написал(а):
>
> On Fri, Jul 30, 2021 at 03:42:10PM +0500, Andrey Borodin wrote:
>>> 30 июля 2021 г., в 07:25, Noah Misch <noah(at)leadboat(dot)com> написал(а):
>>> What alternative fix designs should we consider?
>>
>> I observe that provided patch fixes CIC under normal transactions, but test with 2PC still fails similarly.
>> Unindexed tuple was committed somewhere at the end of Phase 3 or 4.
>> 2021-07-30 15:35:31.806 +05 [25987] 002_cic_2pc.pl LOG: statement: REINDEX INDEX CONCURRENTLY idx;
>> 2021-07-30 15:35:31.806 +05 [25987] 002_cic_2pc.pl WARNING: Phase 1
>> 2021-07-30 15:35:31.806 +05 [25987] 002_cic_2pc.pl WARNING: Phase 2
>> 2021-07-30 15:35:31.806 +05 [25987] 002_cic_2pc.pl WARNING: XXX: VirtualXactLock vxid -1/6735
>> 2021-07-30 15:35:31.807 +05 [25987] 002_cic_2pc.pl WARNING: XXX: VirtualXactLock vxid -1/6736
>> 2021-07-30 15:35:31.808 +05 [25987] 002_cic_2pc.pl WARNING: Phase 3
>> 2021-07-30 15:35:31.808 +05 [25987] 002_cic_2pc.pl WARNING: XXX: VirtualXactLock vxid -1/6750
>> 2021-07-30 15:35:31.809 +05 [25987] 002_cic_2pc.pl WARNING: Phase 4
>> 2021-07-30 15:35:31.809 +05 [25987] 002_cic_2pc.pl WARNING: Phase 5
>> 2021-07-30 15:35:31.809 +05 [25987] 002_cic_2pc.pl WARNING: XXX: VirtualXactLock vxid -1/6762
>> 2021-07-30 15:35:31.809 +05 [25987] 002_cic_2pc.pl WARNING: XXX: VirtualXactLock vxid -1/6763
>> 2021-07-30 15:35:31.810 +05 [25987] 002_cic_2pc.pl WARNING: Phase 6
>> 2021-07-30 15:35:31.810 +05 [25987] 002_cic_2pc.pl WARNING: XXX: VirtualXactLock vxid 6/2166
>> 2021-07-30 15:35:31.810 +05 [25987] 002_cic_2pc.pl WARNING: XXX: VirtualXactLock vxid -1/6767
>> 2021-07-30 15:35:31.810 +05 [25987] 002_cic_2pc.pl WARNING: XXX: VirtualXactLock vxid -1/6764
>> 2021-07-30 15:35:31.810 +05 [25987] 002_cic_2pc.pl WARNING: XXX: VirtualXactLock vxid -1/6765
>> 2021-07-30 15:35:31.811 +05 [25987] 002_cic_2pc.pl WARNING: Phase Final
>> 2021-07-30 15:35:31.811 +05 [25987] 002_cic_2pc.pl LOG: statement: SELECT bt_index_check('idx',true);
>> 2021-07-30 15:35:31.813 +05 [25987] 002_cic_2pc.pl ERROR: heap tuple (46,16) from table "tbl" lacks matching index tuple within index "idx" xmin 6751 xmax 0
>
> I see a failure, too. Once again, "i:" lines are events within the INSERT
> backend, and "r:" lines are events within the REINDEX CONCURRENTLY backend:
>
> r: Phase 2 begins.
> i: INSERT. Start PREPARE.
> r: Phase 2 commits indisready=t for idx_ccnew.
> r: Start waiting for the INSERT to finish.
> i: PREPARE finishes.
> r: Wake up and start validate_index(). This is a problem. It needed to wait
> for COMMIT PREPARED to finish.
I'l investigate this scenario. I've tried to sprinkle some more WaitForLockersMultiple() yet without success.

> This may have a different explanation than the failure you saw, because my
> INSERT transaction already had a permanent XID before the start of phase 3. I
> won't have time to study this further in the next several days. Can you find
> out where things go wrong?
I'll try. This bug is #1 priority for me. We repack ~pb of indexes each weekend (only bloated, many in fact are bloated). And seems like they all are endangered.

> The next thing I would study is VirtualXactLock(),
> specifically what happens if the lock holder is a normal backend (with or
> without an XID) when VirtualXactLock() starts but becomes a prepared
> transaction (w/ different PGPROC) before VirtualXactLock() ends.

PreparedXactLock() will do the trick. If we have xid - we always take a lock on xid. If we have vxid - we try to convert it to xid and look in all PGPROCs for 2PCs. And then again - wait for xid.
At this point I'm certain that if any transaction is reported by GetLockConflicts() it will get awaited by VirtualXactLock().
The problem is that rogue transaction was never reported by GetLockConflicts().

Thanks!

Best regards, Andrey Borodin.

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Peter Geoghegan 2021-07-31 08:51:44 Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data
Previous Message Michael Paquier 2021-07-31 01:40:50 Re: BUG #17061: Impossible to query the fields of the tuple created by SEARCH BREADTH FIRST BY .. SET ..