Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc: Alexander Lakhin <exclusion(at)gmail(dot)com>, "Yu Shi (Fujitsu)" <shiy(dot)fnst(at)fujitsu(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed
Date: 2024-01-15 03:59:01
Message-ID: ZaSthTtVmvKKuGNb@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 12, 2024 at 01:46:08PM +0000, Bertrand Drouvot wrote:
> 1) Michael's proposal up-thread (means tweak the test with a retry logic, retrying
> things if such a standby snapshot is found).
>
> 2) Don't report a test error for active slots in case its catalog_xmin advanced.
>
> I'd vote for 2) as:
>
> - this is a corner case and the vast majority of the animals don't report any
> issues (means the active slot conflict detection is already well covered).
>
> - even on the same animal it should be pretty rare to not have an active slot
> conflict detection not covered at all (and the "failing" one would be probably
> moving over time).
>
> - It may be possible that 1) ends up failing (as we'd need to put a limit on the
> retry logic anyhow).
>
> What do you think?
>
> And BTW, looking closely at wait_until_vacuum_can_remove(), I'm not sure it's
> fully correct, so I'll give it another look.

The WAL records related to standby snapshots are playing a lot with
the randomness of the failures we are seeing. Alexander has mentioned
offlist something else: using SIGSTOP on the bgwriter to avoid these
records and make the test more stable. That would not be workable for
Windows, but I could live with that knowing that logical decoding for
standbys has no platform-speficic tweak for the code paths we're
testing here, and that would put as limitation to skip the test for
$windows_os.

While thinking about that, a second idea came into my mind: a
superuser-settable developer GUC to disable such WAL records to be
generated within certain areas of the test. This requires a small
implementation, but nothing really huge, while being portable
everywhere. And it is not the first time I've been annoyed with these
records when wanting a predictible set of WAL records for some test
case.

Another possibility would be to move these records elsewhere, outside
of the bgwriter, but we need such records at a good frequency for the
availability of read-only standbys. And surely we'd want an on/off
switch anyway to get a full control for test sequences.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-01-15 04:08:39 Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed
Previous Message Peter Smith 2024-01-15 03:30:45 Re: Documentation to upgrade logical replication cluster