Re: Fix 035_standby_logical_decoding.pl race conditions

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Fix 035_standby_logical_decoding.pl race conditions
Date: 2025-03-25 11:34:21
Message-ID: CAA4eK1KuYgEbXswON7v_JGxV_X1QszVf97p3Te9a6FzdW8qXNg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 21, 2025 at 9:48 PM Bertrand Drouvot
<bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
>
> So, I'm not sure I like the idea that much, but thinking out loud: I wonder if
> we could bypass the "active" slot checks in 16 and 17 and use injection points as
> proposed as of 18 (as we need the injection points changes proposed in 0001
> up-thread). Thoughts?
>

The key point is that snapshotConflictHorizon should always be greater
than or equal to oldestRunningXid for this test to pass. The challenge
is that vacuum LOGs the safest xid to be removed as
snapshotConflictHorizon, which I think will always be either one or
more lesser than oldestRunningXid. So, we can't make it pass unless we
ensure there is no running_xact record gets logged after the last
successful transaction (in this case SQL passed to function
wait_until_vacuum_can_remove) and the till the vacuum is replayed on
the standby. I see even check_for_invalidation('pruning_', $logstart,
'with on-access pruning'); failed [1].

Seeing all these failures, I wonder whether we can reliably test
active slots apart from wal_level change test (aka Scenario 6:
incorrect wal_level on primary.). Sure, we can try by having some
injection point kind of tests, but is it really worth because, anyway
the active slots won't get invalidated in the scenarios for row
removal we are testing in this case. The other possibility is to add a
developer-level debug_disable_running_xact GUC to test this and
similar cases, or can't we have an injection point to control logging
this WAL record? I have seen the need to control logging running_xact
record in other cases as well.

[1] - https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2025-03-19%2007%3A08%3A16

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2025-03-25 11:38:44 Re: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided.
Previous Message Ashutosh Bapat 2025-03-25 11:25:27 Re: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided.