Re: BUG #17368: Assert failed in GetSafeSnapshot() for SERIALIZABLE READ ONLY DEFERRABLE transaction

From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17368: Assert failed in GetSafeSnapshot() for SERIALIZABLE READ ONLY DEFERRABLE transaction
Date: 2022-01-15 16:00:00
Message-ID: 18a116d2-7927-6a62-aa60-ab16c53183f9@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

15.01.2022 11:00, PG Bug reporting form wrote:
> The following bug has been logged on the website:
>
> Bug reference: 17368
This bug is separated from the #17116, because it's an unrelated issue.
But it was discussed in the thread #17116 and there was proposed the
working v4-0002-Fix-race-in-SERIALIZABLE-READ-ONLY.patch [1].
I've attached the modification for the read-only-anomaly-3 test to
reproduce the bug (the result is shown in the bug report).
For ease of reading, all relevant info quoted here:
> On Wed, Jul 28, 2021 at 5:02 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
>> On Wed, Jul 28, 2021 at 9:00 AM Alexander Lakhin <exclusion(at)gmail(dot)com> wrote:
>>> (Except for read-only-anomaly-3, that causes another
>>> assertion fail but I believe that it's not related to the initial issue
>>> and deserves another bug report.)
>> Huh. I tried and failed to find that one with concurrent runs. I'll
>> wait for your next report before I do anything, just in case there's a
>> connection.
> This is indeed an unrelated issue, and much older. After many
> kilowatt hours, I reproduced it here and worked out that it comes from
> GetSafeSnapshot() not expecting possibleUnsafeConflicts to be empty
> when WritableSxactCount == 0. More soon.

> On Fri, Jul 30, 2021 at 9:41 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> The problem is that if we don't take the fast exit here:
>
> if (XactReadOnly && PredXact->WritableSxactCount == 0)
> {
> ReleasePredXact(sxact);
> LWLockRelease(SerializableXactHashLock);
> return snapshot;
> }
>
> ... then we search for writable SSI transactions here:
>
> for (othersxact = FirstPredXact();
> othersxact != NULL;
> othersxact = NextPredXact(othersxact))
> {
> if (!SxactIsCommitted(othersxact)
> && !SxactIsDoomed(othersxact)
> && !SxactIsReadOnly(othersxact))
> {
> SetPossibleUnsafeConflict(sxact, othersxact);
> }
> }
>
> ... but that can find nothing if all writers happen to be doomed.
> WritableSxactCount doesn't count read-only or committed transactions
> so we don't have to worry about those confusing us, but it does count
> doomed transactions. Having an empty list here is mostly fine, except
> that nobody will ever set our RO_SAFE flag, and in the case of
> DEFERRABLE, the assertion that someone has set it will fail. This is
> the case since:
>
> ===
> commit bdaabb9b22caa71021754d3967b4032b194d9880
> Author: Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi>
> Date: Fri Jul 8 00:36:30 2011 +0300
>
> There's a small window wherein a transaction is committed but not yet
> on the finished list, and we shouldn't flag it as a potential conflict
> if so. We can also skip adding a doomed transaction to the list of
> possible conflicts because we know it won't commit.
>
> Dan Ports and Kevin Grittner.
> ===
>
> I see three fixes:
>
> 1. Check if the list is empty, and if so, set our own
> SXACT_FLAG_RO_SAFE. Then the assertion in GetSafeSnapshot() will
> pass, and also non-DEFERRABLE callers will eventually see the flag and
> opt out.
>
> 2. Check if the list is empty, and if so, opt out immediately (as we
> do when WriteableSxactCount == 0). This would be strictly better than
> #1, because it happens sooner while we still have the lock. On the
> other hand, we'd have to undo more effects, and that involves writing
> more fragile and very rarely run code.
>
> 3. Just delete the assertion in GetSafeSnapshot(). This is
> attractively simple but it means that READ ONLY non-DEFERRABLE
> transactions arbitrarily miss out on the RO_SAFE optimisation in this
> (admittedly rare) case.
>
> The attached 0002 has idea #1, which I prefer for back-patching. I
> think #2 might be a good idea if we take the filtering logic further
> so that it becomes more likely that you get an empty list (for
> example, why don't we skip transactions that are running in a
> different database?), but then that'd be new code, not something we
> back-patch. Thoughts?
[1]
https://www.postgresql.org/message-id/CA%2BhUKGJd-%3DmdUdS%2BGh-FN4rZgAg4M-V%3D%3DG7FMCU-KbUffyPJDw%40mail.gmail.com

Best regards,
Alexander

Attachment Content-Type Size
read-only-anomaly-3+.patch text/x-patch 793 bytes

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Alexander Lakhin 2022-01-16 06:00:01 Re: BUG #17116: Assert failed in SerialSetActiveSerXmin() on commit of parallelized serializable transaction
Previous Message PG Bug reporting form 2022-01-15 08:00:01 BUG #17368: Assert failed in GetSafeSnapshot() for SERIALIZABLE READ ONLY DEFERRABLE transaction