Re: Unexpected data when subscribing to logical replication slot

From: Daniel McKenzie <daniel(dot)mckenzie(at)curvedental(dot)com>
To: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>, tomas(dot)vondra(at)enterprisedb(dot)com, tfoertsch123(at)gmail(dot)com
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Unexpected data when subscribing to logical replication slot
Date: 2024-05-10 12:11:42
Message-ID: CACKLY6ibsfhfFrrLbSXANcjsRFa5RUGxeY5ra-hcUnOPAtJVDg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thank you all for your input.

We have solved the problem by -

1. Configuring wal2json to include xids
<https://github.com/eulerto/wal2json/blob/master/README.md>.
2. Updating our enrichment queries to return the xmin
<https://www.postgresql.org/docs/9.1/ddl-system-columns.html>.
3. Retrying our enrichment queries after a brief sleep if xmin < xid.

This seems to be working very well so far.

Daniel McKenzie
Software Developer

Office: +1 403.910.5927 x 251
Mobile: +44 7712 159045
Website: www.curvedental.com

*Curve Dental Confidentiality Notice*
This message is intended exclusively for the individual or entity to which
it is addressed. This communication may contain information that is
proprietary, privileged, confidential, or otherwise legally exempt from
disclosure. If you are not the named addressee, you are not authorized to
read, print, retain, copy, or disseminate this message or any part of it.
If you have received this message in error, please notify the sender
immediately by replying to this e-mail and delete all copies of this
message.

On Thu, May 9, 2024 at 5:45 PM Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
wrote:

> On 5/9/24 00:32, Daniel McKenzie wrote:
>
> > We've had this running in live now for years without a hiccup so we are
> > surprised to learn that we have this massive race condition and it just
> > so happens that the hardware is fast enough to process the transaction
> > before the .NET application can react to replication slot changes.
>
> On broad scale I'm going to say that over 'for years' there has been an
> increase in load on the Postgres server as well as the I/0 system of the
> machine it is running on. What you are seeing now is the canary in the
> mine giving you the heads up that more trouble lies ahead as the
> hardware and software is reaching load limits.
>
> On finer scale my guess is that the following is happening when
> synchronous_commit = on:
>
> 1) Postgres session #1 does data change.
>
> 2) This is replicated out and picked up by wal2json, which sees the new
> data.
>
> 3) The Postgres server waits for the confirmation that the WAL record
> has been written out to disk. Upon confirmation it commits on the
> server. This is the part that I am not sure of in relation to wal2json.
>
> 4) Postgres session #2 queries the database for the record. In the case
> where 3) has not completed it sees the old values as the data change in
> session #1 has not committed and therefore the new values are not seen
> by other sessions.
>
> >
> > Daniel McKenzie
> > Software Developer
>
> --
> Adrian Klaver
> adrian(dot)klaver(at)aklaver(dot)com
>
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Dimitrios Apostolou 2024-05-10 18:38:07 SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions
Previous Message Avi Weinberg 2024-05-10 11:18:48 Scenarios that Replication Slot will be Marked as Active = false