From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Alexander Lakhin <exclusion(at)gmail(dot)com> |
Cc: | PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org> |
Subject: | Re: BUG #17072: Assert for clogGroupNext failed due to a race condition in TransactionGroupUpdateXidStatus() |
Date: | 2021-06-26 09:57:30 |
Message-ID: | CAA4eK1+yUrD=xxxRQWiH_dFo8go_W-R-C3FsbvwkPnMtdKe74A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Fri, Jun 25, 2021 at 4:30 PM Alexander Lakhin <exclusion(at)gmail(dot)com> wrote:
>
> Hello Amit,
> 25.06.2021 12:55, Amit Kapila wrote:
> > On Fri, Jun 25, 2021 at 12:20 AM PG Bug reporting form
> > <noreply(at)postgresql(dot)org> wrote:
> >> The offending (the one that leaved a "valid" clogGroupNext) proccess is
> >> 60d48c2d.ea21. It looks like it got from the
> >> pg_atomic_compare_exchange_u32() the nextidx value that was written in the
> >> clogGroupFirst by the process 60d48c2e.ebc5, and exited just after that.
> >>
> > Your analysis seems to be in the right direction. Can you try by
> > setting clogGroupNext to INVALID_PGPROCNO
> > (pg_atomic_write_u32(&proc->clogGroupNext, INVALID_PGPROCNO);) before
> > we return false in the first while(true) loop in function
> > TransactionGroupUpdateXidStatus()?
> With this modification that assert is not triggered, all 100 iterations
> pass fine (triple checked).
>
Okay, please find the patch for the same attached.
> > I think this should be reproducible on all branches from HEAD till
> > v11. Have you tried in any other branch? I'll also try to reproduce
> > it.
> I've reproduced it on REL_11_STABLE, REL_12_STABLE, REL_13_STABLE, and
> master.
>
Please see if you can verify whether the attached fixes it in all the
branches? I have also reproduced it in a bit different way by using a
debugger. Basically, by having three sessions trying to commit at the
same time. After the first session became the first group member,
allow the second session to check if it can become a member and stop
it via debugger just before it becomes the member. Then, allow the
first session to complete the transaction and allow the third session
to become the group leader (or first group member). After that when
the second session tries to become the member, it will notice that the
leader has changed and again try to become a member of the new leader.
Then, I forced via debugger to allow the second member to return false
and perform the commit by itself. Next, disconnect and connect again
in the second session and we will see assertion failure as reported by
you. The attached patch fixes the assertion failure.
--
With Regards,
Amit Kapila.
Attachment | Content-Type | Size |
---|---|---|
v1-0001-Fix-race-condition-in-TransactionGroupUpdateXidSt.patch | application/octet-stream | 1.8 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Andrey Lepikhov | 2021-06-26 18:48:45 | Assertion on create index concurrently |
Previous Message | Alexander Korotkov | 2021-06-25 21:02:24 | Re: BUG #17066: Cache lookup failed when null (unknown) is passed as anycompatiblemultirange |