Re: BUG #17949: Adding an index introduces serialisation anomalies.

From: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: artem(dot)anisimov(dot)255(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject: Re: BUG #17949: Adding an index introduces serialisation anomalies.
Date: 2023-06-24 13:59:10
Message-ID: 20230624135910.zufuollwp57otawo@ddolgov.remote.csb
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

> On Fri, Jun 23, 2023 at 04:05:42PM +0200, Dmitry Dolgov wrote:
> > I spent some more time trying to grok this today. FTR it reproduces
> > faster without the extra tuple that repro I posted inserts after
> > TRUNCATE (the point of that was to find out whether it was an
> > empty-to-non-empty transition). I still don't know what's wrong but I
> > am beginning to suspect the "fast" code. It seems as though, under
> > high concurrency, we sometimes don't scan a recently inserted
> > (invisible to our snapshot, but needed for SSI checks) tuple, but I
> > don't yet know why.
>
> Yep, it's definitely something in the "fast" path. Testing the same, but
> with an index having (fastupdate=off) works just fine for me.

I've managed to resolve it, or at least reduce the chances for the issue to
appear, via semi-randomly adding more CheckForSerializableConflictIn /
PredicateLock around the new sublist that has to be created in
ginHeapTupleFastInsert. I haven't seen the reproducer failing with this
changeset after running it multiple times for a couple of minutes, where on the
main branch, with the two fixes from Thomas included, it was failing within a
couple of seconds.

diff --git a/src/backend/access/gin/ginfast.c b/src/backend/access/gin/ginfast.c
@@ -198,6 +199,7 @@ makeSublist(Relation index, IndexTuple *tuples, int32 ntuples,
/*
* Write last page
*/
+ CheckForSerializableConflictIn(index, NULL, BufferGetBlockNumber(curBuffer));
res->tail = BufferGetBlockNumber(curBuffer);
res->tailFreeSize = writeListPage(index, curBuffer,

@@ -273,6 +275,11 @@ ginHeapTupleFastInsert(GinState *ginstate, GinTupleCollector *collector)
separateList = true;
LockBuffer(metabuffer, GIN_UNLOCK);
}
+ else
+ {
+ CheckForSerializableConflictIn(index, NULL, metadata->head);
+ }
}

diff --git a/src/backend/access/gin/ginget.c b/src/backend/access/gin/ginget.c
@@ -140,7 +140,7 @@ collectMatchBitmap(GinBtreeData *btree, GinBtreeStack *stack,
* Predicate lock entry leaf page, following pages will be locked by
* moveRightIfItNeeded()
*/
- PredicateLockPage(btree->index, stack->buffer, snapshot);
+ PredicateLockPage(btree->index, BufferGetBlockNumber(stack->buffer), snapshot);

for (;;)
{
@@ -1925,6 +1926,8 @@ gingetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
/*
* Set up the scan keys, and check for unsatisfiable query.
*/
+ PredicateLockRelation(scan->indexRelation, scan->xs_snapshot);
ginFreeScanKeys(so); /* there should be no keys yet, but just to be

Now the last PredicateLockRelation does look rather weird. But without it, or
with this locking happening in scanPendingInsert (instead of locking the meta
page), or without other changes in ginfast.c, the reproducer is still failing.
This of course makes it not a proper solution by any mean, but hopefully it
will help to understand the problem a bit better.

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2023-06-24 16:56:06 Re: Server closed the connection unexpectedly (memory leak)
Previous Message Joe Conway 2023-06-24 13:09:44 Re: BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG