Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed

From: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Alexander Lakhin <exclusion(at)gmail(dot)com>, "Yu Shi (Fujitsu)" <shiy(dot)fnst(at)fujitsu(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed
Date: 2024-01-12 07:15:44
Message-ID: ZaDnIMTbT+0yR5LZ@ip-10-97-1-34.eu-west-3.compute.internal
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Fri, Jan 12, 2024 at 07:01:55AM +0900, Michael Paquier wrote:
> On Thu, Jan 11, 2024 at 11:00:01PM +0300, Alexander Lakhin wrote:
> > Bertrand, I've relaunched tests in the same slowed down VM with both
> > patches applied (but with no other modifications) and got a failure
> > with pg_class, similar to what we had seen before:
> > 9       #   Failed test 'activeslot slot invalidation is logged with vacuum on pg_class'
> > 9       #   at t/035_standby_logical_decoding.pl line 230.
> >
> > Please look at the logs attached (I see there Standby/RUNNING_XACTS near
> > 'invalidating obsolete replication slot "row_removal_inactiveslot"').

Thanks!

For this one, the "good" news is that it looks like that we don’t see the
"terminating" message not followed by an "obsolete" message (so the engine
behaves correctly) anymore.

There is simply nothing related to the row_removal_activeslot at all (the catalog_xmin
advanced and there is no conflict).

And I agree that this is due to the Standby/RUNNING_XACTS that is "advancing" the
catalog_xmin of the active slot.

> Standby/RUNNING_XACTS is exactly why 039_end_of_wal.pl uses wal_level
> = minimal, because these lead to unpredictible records inserted,
> impacting the reliability of the tests. We cannot do that here,
> obviously. That may be a long shot, but could it be possible to tweak
> the test with a retry logic, retrying things if such a standby
> snapshot is found because we know that the invalidation is not going
> to work anyway?

I think it all depends what the xl_running_xacts does contain (means does it
"advance" or not the catalog_xmin in our case).

In our case it does advance it (should it occurs) due to the "select txid_current()"
that is done in wait_until_vacuum_can_remove() in 035_standby_logical_decoding.pl.

I suggest to make use of txid_current_snapshot() instead (that does not produce
a Transaction/COMMIT wal record, as opposed to txid_current()).

I think that it could be "enough" for our case here, and it's what v5 attached is
now doing.

Let's give v5 a try? (please apply v1-0001-Fix-race-condition-in-InvalidatePossiblyObsoleteS.patch
too).

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v5-0001-Fix-035_standby_logical_decoding.pl-race-conditio.patch text/x-diff 9.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2024-01-12 08:49:04 Re: [PoC] Improve dead tuple storage for lazy vacuum
Previous Message Bertrand Drouvot 2024-01-12 06:54:09 Re: Synchronizing slots from primary to standby