Re: BF member drongo doesn't like 035_standby_logical_decoding.pl

From: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject: Re: BF member drongo doesn't like 035_standby_logical_decoding.pl
Date: 2025-01-27 07:13:01
Message-ID: Z5cx/aExSSutUK8E@ip-10-97-1-34.eu-west-3.compute.internal
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Fri, Jan 24, 2025 at 02:44:21PM -0500, Andres Freund wrote:
> Hm, maybe I'm missing something, but isn't it possible for the active slot to
> actually progress decoding past the conflict point? It's an active slot, with
> the consumer running in the background, so all that needs to happen for that
> is that logical decoding progresses past the conflict point. That requires
> there be some reference to a newer xid to be in the WAL, but there's nothing
> preventing that afaict?
>
>
> In fact, I now saw this comment:
>
> # Note that pg_current_snapshot() is used to get the horizon. It does
> # not generate a Transaction/COMMIT WAL record, decreasing the risk of
> # seeing a xl_running_xacts that would advance an active replication slot's
> # catalog_xmin. Advancing the active replication slot's catalog_xmin
> # would break some tests that expect the active slot to conflict with
> # the catalog xmin horizon.

Yeah, that comes from 46d8587b504 (where we tried to reduce as much as possible
the risk of seeing an unwanted xl_running_xacts being generated).

> Which seems precisely what's happening here?

Much probably yes.

> If that's the issue, I think we need to find a way to block logical decoding
> from making forward progress during the test.
>
> The easiest way would be to stop pg_recvlogical and emit a bunch of changes,
> so that the backend is stalled sending out data. But that'd require a hard to
> predict amount of data to be emitted, which isn't great.

What about using an injection point instead to block pg_recvlogical until
we want it to resume?

> But perhaps we could do something smarter, by starting a session on the
> primary that acquires an access exclusive lock on a relation that logical
> decoding will need to access? The tricky bit likely would be that it'd
> somehow need to *not* prevent VACUUM on the primary.

Hm, I'm not sure how we could do that.

> If we could trigger VACUUM in a transaction on the primary this would be
> easy, but we can't.

Another idea that I had ([1]) was to make use of injection points
around places where RUNNING_XACTS is emitted. IIRC I tried to work on this but
that was not simple as it sounds as we need the startup process not to be blocked
.

[1]: https://www.postgresql.org/message-id/ZmadPZlEecJNbhvI%40ip-10-97-1-34.eu-west-3.compute.internal

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bertrand Drouvot 2025-01-27 07:15:08 Re: BF member drongo doesn't like 035_standby_logical_decoding.pl
Previous Message Alexander Pyhalov 2025-01-27 06:46:35 Re: postgres_fdw could deparse ArrayCoerceExpr