From: | "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> |
---|---|
To: | 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Subject: | RE: Assertion failure in SnapBuildInitialSnapshot() |
Date: | 2023-01-28 14:54:22 |
Message-ID: | TYAPR01MB586690B7DE22E362A4A65F8CF5CD9@TYAPR01MB5866.jpnprd01.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Dear Amit, Sawada-san,
I have also reproduced the failure on PG15 with some debug log, and I agreed that
somebody changed procArray->replication_slot_xmin to InvalidTransactionId.
> > The same assertion failure has been reported on another thread[1].
> > Since I could reproduce this issue several times in my environment
> > I've investigated the root cause.
> >
> > I think there is a race condition of updating
> > procArray->replication_slot_xmin by CreateInitDecodingContext() and
> > LogicalConfirmReceivedLocation().
> >
> > What I observed in the test was that a walsender process called:
> > SnapBuildProcessRunningXacts()
> > LogicalIncreaseXminForSlot()
> > LogicalConfirmReceivedLocation()
> > ReplicationSlotsComputeRequiredXmin(false).
> >
> > In ReplicationSlotsComputeRequiredXmin() it acquired the
> > ReplicationSlotControlLock and got 0 as the minimum xmin since there
> > was no wal sender having effective_xmin.
> >
>
> What about the current walsender process which is processing
> running_xacts via SnapBuildProcessRunningXacts()? Isn't that walsender
> slot's effective_xmin have a non-zero value? If not, then why?
Normal walsenders which are not for tablesync create a replication slot with
NOEXPORT_SNAPSHOT option. I think in this case, CreateInitDecodingContext() is
called with need_full_snapshot = false, and slot->effective_xmin is not updated.
It is set as InvalidTransactionId at ReplicationSlotCreate() and no functions update
that. Hence the slot acquired by the walsender may have Invalid effective_min.
Best Regards,
Hayato Kuroda
FUJITSU LIMITED
From | Date | Subject | |
---|---|---|---|
Next Message | Joel Jacobson | 2023-01-28 22:13:47 | [PATCH] Fix old thinko in formula to compute sweight in numeric_sqrt(). |
Previous Message | Tomas Vondra | 2023-01-28 12:05:20 | Re: lockup in parallel hash join on dikkop (freebsd 14.0-current) |