From: | Alexander Lakhin <exclusion(at)gmail(dot)com> |
---|---|
To: | Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | "Yu Shi (Fujitsu)" <shiy(dot)fnst(at)fujitsu(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed |
Date: | 2024-01-12 11:00:01 |
Message-ID: | cc7925b8-30cc-c76d-b1b6-c9ec6bd36a03@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
12.01.2024 10:15, Bertrand Drouvot wrote:
>
> For this one, the "good" news is that it looks like that we don’t see the
> "terminating" message not followed by an "obsolete" message (so the engine
> behaves correctly) anymore.
>
> There is simply nothing related to the row_removal_activeslot at all (the catalog_xmin
> advanced and there is no conflict).
Yes, judging from all the failures that we see now, it looks like the
0001-Fix-race-condition...patch works as expected.
> And I agree that this is due to the Standby/RUNNING_XACTS that is "advancing" the
> catalog_xmin of the active slot.
>
>> Standby/RUNNING_XACTS is exactly why 039_end_of_wal.pl uses wal_level
>> = minimal, because these lead to unpredictible records inserted,
>> impacting the reliability of the tests. We cannot do that here,
>> obviously. That may be a long shot, but could it be possible to tweak
>> the test with a retry logic, retrying things if such a standby
>> snapshot is found because we know that the invalidation is not going
>> to work anyway?
> I think it all depends what the xl_running_xacts does contain (means does it
> "advance" or not the catalog_xmin in our case).
>
> In our case it does advance it (should it occurs) due to the "select txid_current()"
> that is done in wait_until_vacuum_can_remove() in 035_standby_logical_decoding.pl.
>
> I suggest to make use of txid_current_snapshot() instead (that does not produce
> a Transaction/COMMIT wal record, as opposed to txid_current()).
>
> I think that it could be "enough" for our case here, and it's what v5 attached is
> now doing.
>
> Let's give v5 a try? (please apply v1-0001-Fix-race-condition-in-InvalidatePossiblyObsoleteS.patch
> too).
Unfortunately, I've got the failure again (please see logs attached).
(_primary.log can confirm that I have used exactly v5 — I see no
txid_current() calls there...)
Best regards,
Alexander
Attachment | Content-Type | Size |
---|---|---|
035-failures-vacuum-pg_authid.tar.gz | application/gzip | 150.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2024-01-12 11:16:37 | Re: Make attstattarget nullable |
Previous Message | Michael Banck | 2024-01-12 10:54:29 | Re: plpgsql memory leaks |