Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

From: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data
Date: 2021-07-29 12:30:56
Message-ID: B8B5720D-B325-42B1-A5EF-ED01725503FA@yandex-team.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

> 27 июля 2021 г., в 17:50, Andrey Borodin <x4mmm(at)yandex-team(dot)ru> написал(а):
>
>
> 3. It still breaks the fix and I have no idea why.

I'm still coping with the bug. I have a stable reproduction, but to the date unable to identify roots of the problem.
Here's the sample trace:
2021-07-29 17:14:28.996 +05 [61148] 002_cic_2pc.pl LOG: statement: REINDEX INDEX CONCURRENTLY idx;
2021-07-29 17:14:28.996 +05 [61148] 002_cic_2pc.pl WARNING: Phase 1
2021-07-29 17:14:28.997 +05 [61148] 002_cic_2pc.pl WARNING: Phase 2
2021-07-29 17:14:28.997 +05 [61148] 002_cic_2pc.pl WARNING: XXX: VirtualXactLock vxid -1/3493
2021-07-29 17:14:28.997 +05 [61148] 002_cic_2pc.pl WARNING: XXX: PreparedXactLock xid 3493, vxid -1/3493
2021-07-29 17:14:28.997 +05 [61148] 002_cic_2pc.pl WARNING: XXX: VirtualXactLock vxid -1/3490
2021-07-29 17:14:28.997 +05 [61148] 002_cic_2pc.pl WARNING: XXX: PreparedXactLock xid 3490, vxid -1/3490
2021-07-29 17:14:28.997 +05 [61148] 002_cic_2pc.pl WARNING: XXX: VirtualXactLock vxid -1/3492
2021-07-29 17:14:28.997 +05 [61148] 002_cic_2pc.pl WARNING: XXX: PreparedXactLock xid 3492, vxid -1/3492
2021-07-29 17:14:28.999 +05 [61148] 002_cic_2pc.pl WARNING: Phase 3
2021-07-29 17:14:28.999 +05 [61148] 002_cic_2pc.pl WARNING: Phase 4
2021-07-29 17:14:29.000 +05 [61148] 002_cic_2pc.pl WARNING: Phase 5
2021-07-29 17:14:29.005 +05 [61148] 002_cic_2pc.pl WARNING: XXX: VirtualXactLock vxid -1/3512
2021-07-29 17:14:29.005 +05 [61148] 002_cic_2pc.pl WARNING: XXX: PreparedXactLock xid 3512, vxid -1/3512
2021-07-29 17:14:29.006 +05 [61148] 002_cic_2pc.pl WARNING: XXX: VirtualXactLock vxid -1/3513
2021-07-29 17:14:29.006 +05 [61148] 002_cic_2pc.pl WARNING: XXX: PreparedXactLock xid 3513, vxid -1/3513
2021-07-29 17:14:29.006 +05 [61148] 002_cic_2pc.pl WARNING: Phase 6
2021-07-29 17:14:29.006 +05 [61148] 002_cic_2pc.pl WARNING: XXX: VirtualXactLock vxid -1/3516
2021-07-29 17:14:29.006 +05 [61148] 002_cic_2pc.pl WARNING: XXX: PreparedXactLock xid 3516, vxid -1/3516
2021-07-29 17:14:29.006 +05 [61148] 002_cic_2pc.pl WARNING: XXX: VirtualXactLock vxid -1/3515
2021-07-29 17:14:29.007 +05 [61148] 002_cic_2pc.pl WARNING: XXX: PreparedXactLock xid 3515, vxid -1/3515
2021-07-29 17:14:29.007 +05 [61148] 002_cic_2pc.pl WARNING: XXX: VirtualXactLock vxid -1/3517
2021-07-29 17:14:29.007 +05 [61148] 002_cic_2pc.pl WARNING: XXX: PreparedXactLock xid 3517, vxid -1/3517
2021-07-29 17:14:29.007 +05 [61148] 002_cic_2pc.pl WARNING: XXX: VirtualXactLock vxid 4/1002
2021-07-29 17:14:29.007 +05 [61148] 002_cic_2pc.pl WARNING: Backend is doing something else
2021-07-29 17:14:29.007 +05 [61148] 002_cic_2pc.pl WARNING: XXX: PreparedXactLock xid 0, vxid 4/1002
2021-07-29 17:14:29.007 +05 [61148] 002_cic_2pc.pl WARNING: XXX: Sucessfully found xid by vxid 0
2021-07-29 17:14:29.007 +05 [61148] 002_cic_2pc.pl WARNING: Phase Final
2021-07-29 17:14:29.007 +05 [61148] 002_cic_2pc.pl LOG: statement: SELECT bt_index_check('idx',true);
2021-07-29 17:14:29.009 +05 [61148] 002_cic_2pc.pl ERROR: heap tuple (18,74) from table "tbl" lacks matching index tuple within index "idx" xmin 3504 xmax 0
2021-07-29 17:14:29.009 +05 [61148] 002_cic_2pc.pl STATEMENT: SELECT bt_index_check('idx',true);

Rogue tuple was committed by xid 3504 which was never returned by GetLockConflicts(). I'm attaching patch set just for the reference of the trace, not expecting code review now.

I've fixed unrelated bug in previous patchset.
- SET_LOCKTAG_TRANSACTION(tag, xid);
+ SET_LOCKTAG_TRANSACTION(tag, xidlist.xid);
lar = LockAcquire(&tag, ShareLock, false, !wait);
if (lar != LOCKACQUIRE_NOT_AVAIL)
LockRelease(&tag, ShareLock, false);
- return lar != LOCKACQUIRE_NOT_AVAIL;
+ if (lar == LOCKACQUIRE_NOT_AVAIL)
+ return false;

But it does not help. Maybe I've broken something by my fix...Searching further.

Thanks for reading! Would be happy to hear any ideas why xid was not locked by GetLockConflicts() bug committed a tuple which was not indexed.

Best regards, Andrey Borodin.

Attachment Content-Type Size
v6-0001-Introduce-TAP-test-for-2PC-with-CIC-behavior.patch application/octet-stream 7.0 KB
v6-0002-Fix-CREATE-INDEX-CONCURRENTLY-in-precence-of-vxid.patch application/octet-stream 8.2 KB
v6-0003-Various-debug-stuff-and-fix-in-xidlist.xid.patch application/octet-stream 9.1 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Andrey Borodin 2021-07-29 14:21:44 Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data
Previous Message Kyotaro Horiguchi 2021-07-29 07:50:07 Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size