From: | Noah Misch <noah(at)leadboat(dot)com> |
---|---|
To: | Andrey Borodin <x4mmm(at)yandex-team(dot)ru> |
Cc: | Michael Paquier <michael(at)paquier(dot)xyz>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data |
Date: | 2021-07-30 02:25:48 |
Message-ID: | 20210730022548.GA1940096@gust.leadboat.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Thu, Jul 29, 2021 at 07:21:44PM +0500, Andrey Borodin wrote:
> Currently CIC test breaks indexes even without 2PC. How is it supposed to work if vxid stays vxid through GetLockConflicts()\WaitForLockersMultiple() barrier and then suddenly converts to xid and commits before index validated?
It's an inval race condition. Here's the sequence of events leading to a
failure. "i:" lines are events within the INSERT backend, and "r:" lines are
events within the REINDEX CONCURRENTLY backend:
r: Phase 2 begins.
i: Start INSERT.
i: Acquire RowExclusiveLock on "idx_ccnew".
i: RelationBuildDesc() begins for relation "idx_ccnew".
i: RelationBuildDesc() calls RelationInitIndexAccessInfo(), which loads
indisready=f.
r: Phase 2 commits indisready=t for idx_ccnew.
r: Start waiting for the INSERT to finish.
i: LocalExecuteInvalidationMessage() calls RelationCacheInvalidateEntry() for
the OID of relation "idx_ccnew". Its RelationIdCacheLookup() call returns
NULL, because nothing has passed the relation to RelationCacheInsert().
i: RelationBuildDesc(), the same call that began above, calls
RelationCacheInsert(). This is a problem, because the entry needs
invalidation, but we already consumed the invalidation messages.
i: Inserts heap tuple with xmin=xid1. Does not insert to idx_ccnew. Commits.
r: validate_index() runs, adding the index tuple for xid1.
i: Second INSERT runs, adding a heap tuple with xmin=xid2. It uses the stale
relcache entry having indisready=f, so this insert also does not modify
idx_ccnew. This time, it was responsible for making that index entry;
nothing else will.
In summary, the relcache is deaf to invals of a given relation during
RelationBuildDesc() of that relation. One fix would be to make
RelationBuildDesc() more like this pseudocode:
retry:
WatchInvalFor(targetRelId);
... complicated build steps that may call AcceptInvalidationMessages() ...
if (ReceivedInvalFor(targetRelId))
goto retry;
I'm attaching a proof-of-concept sufficient to let your v7 pgbench test pass.
Among other things, I haven't checked whether other inval message types have
the same hazard. What alternative fix designs should we consider?
Attachment | Content-Type | Size |
---|---|---|
inval-build-race-v0.patch | text/x-diff | 1.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Munro | 2021-07-30 05:48:25 | Re: BUG #17116: Assert failed in SerialSetActiveSerXmin() on commit of parallelized serializable transaction |
Previous Message | PG Bug reporting form | 2021-07-29 22:35:01 | BUG #17128: minimum numeric 'integer' is -2147483647 not -2147483648 as documented |