Re: Minimal logical decoding on standbys

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, fabriziomello(at)gmail(dot)com, tushar <tushar(dot)ahuja(at)enterprisedb(dot)com>, Rahila Syed <rahila(dot)syed(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Minimal logical decoding on standbys
Date: 2023-04-03 06:35:17
Message-ID: CAA4eK1JX-HAmm3TuY=wCSgom+hmiKtkf_xaJLRX9JBo-iXUxHw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Apr 3, 2023 at 4:39 AM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>
> > From 56a9559555918a99c202a0924f7b2ede9de4e75d Mon Sep 17 00:00:00 2001
> > From: bdrouvotAWS <bdrouvot(at)amazon(dot)com>
> > Date: Tue, 7 Feb 2023 08:59:47 +0000
> > Subject: [PATCH v52 3/6] Allow logical decoding on standby.
> >
> > Allow a logical slot to be created on standby. Restrict its usage
> > or its creation if wal_level on primary is less than logical.
> > During slot creation, it's restart_lsn is set to the last replayed
> > LSN. Effectively, a logical slot creation on standby waits for an
> > xl_running_xact record to arrive from primary.
>
> Hmm, not sure if it really applies here, but this sounds similar to
> issues with track_commit_timestamps: namely, if the primary has it
> enabled and you start a standby with it enabled, that's fine; but if the
> primary is later shut down (but the standby isn't) and then the primary
> restarted with a lesser value, then the standby would misbehave without
> any obvious errors.
>

IIUC, the patch deals it by invalidating logical slots while replaying
the XLOG_PARAMETER_CHANGE record on standby. Then later during
decoding, if it encounters XLOG_PARAMETER_CHANGE, and wal_level from
primary has been reduced, it will return an error. There is a race
condition here as explained in the patch as follows:

+ /*
+ * If wal_level on primary is reduced to less than logical, then we
+ * want to prevent existing logical slots from being used.
+ * Existing logical slots on standby get invalidated when this WAL
+ * record is replayed; and further, slot creation fails when the
+ * wal level is not sufficient; but all these operations are not
+ * synchronized, so a logical slot may creep in while the wal_level
+ * is being reduced. Hence this extra check.
+ */
+ if (xlrec->wal_level < WAL_LEVEL_LOGICAL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("logical decoding on standby requires "
+ "wal_level >= logical on master")));

Now, during this race condition, say not only does a logical slot
creep in but also one tries to decode WAL using the same then some
misbehavior is expected. I have not tried this so not sure if this is
really a problem but are you worried about something along those
lines?

> If that is a real problem, then perhaps you can
> solve it by copying some of the logic from track_commit_timestamps,
> which took a large number of iterations to get right.
>

IIUC, track_commit_timestamps deactivates the CommitTs module (by
using state in the shared memory) when replaying the
XLOG_PARAMETER_CHANGE record. Then later using that state it gives an
error from the appropriate place in the CommitTs module. If my
understanding is correct then that appears to be a better design than
what the patch is currently doing. Also, the error message used in
error_commit_ts_disabled() seems to be better than the current one.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kumar, Sachin 2023-04-03 06:53:56 RE: Initial Schema Sync for Logical Replication
Previous Message Kyotaro Horiguchi 2023-04-03 06:31:55 Re: BUG #17877: Referencing a system column in a foreign key leads to incorrect memory access