From: | Alexander Lakhin <exclusion(at)gmail(dot)com> |
---|---|
To: | Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com> |
Cc: | Michael Paquier <michael(at)paquier(dot)xyz>, "Yu Shi (Fujitsu)" <shiy(dot)fnst(at)fujitsu(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed |
Date: | 2024-01-11 20:00:01 |
Message-ID: | 6f85667e-5754-5d35-dbf1-c83fe08c1e48@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
11.01.2024 17:58, Bertrand Drouvot wrote:
> So I think we have 2 issues here:
>
> 1) The one you're mentioning above related to the on-access pruning test:
>
> I think the engine behavior is right here and that the test is racy. I'm
> proposing to bypass the active slot invalidation check for this particular test (
> as I don't see any "easy" way to take care of this race condition). The active slot
> invalidation is already well covered in the other tests anyway.
>
> I'm proposing the attached v4-0001-Fix-035_standby_logical_decoding.pl-race-conditio.patch
> for it.
>
> 2) The fact that sometime we're getting a termination message which is not followed
> by an obsolete one (like as discussed in [1]).
>
> For this one, I think that InvalidatePossiblyObsoleteSlot() is racy:
>
> In case of an active slot we proceed in 2 steps:
> - terminate the backend holding the slot
> - report the slot as obsolete
>
> This is racy because between the two we release the mutex on the slot, which
> means the effective_xmin and effective_catalog_xmin could advance during that time.
>
> I'm proposing the attached v1-0001-Fix-race-condition-in-InvalidatePossiblyObsoleteS.patch
> for it.
>
> Would it be possible to re-launch your repro (the slow one, not the pg_sleep() one)
> with bot patch applied and see how it goes? (Please note that v4 replaces v3 that
> you're already using in your tests).
>
> If it helps, I'll propose v1-0001-Fix-race-condition-in-InvalidatePossiblyObsoleteS.patch
> into a dedicated hackers thread.
>
> [1]: https://www.postgresql.org/message-id/ZZ7GpII4bAYN%2BjT5%40ip-10-97-1-34.eu-west-3.compute.internal
Bertrand, I've relaunched tests in the same slowed down VM with both
patches applied (but with no other modifications) and got a failure
with pg_class, similar to what we had seen before:
9 # Failed test 'activeslot slot invalidation is logged with vacuum on pg_class'
9 # at t/035_standby_logical_decoding.pl line 230.
Please look at the logs attached (I see there Standby/RUNNING_XACTS near
'invalidating obsolete replication slot "row_removal_inactiveslot"').
Best regards,
Alexander
Attachment | Content-Type | Size |
---|---|---|
035-failure-vacuum-pg_class.tar.gz | application/gzip | 119.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2024-01-11 21:49:01 | Re: Emit fewer vacuum records by reaping removable tuples during pruning |
Previous Message | Melanie Plageman | 2024-01-11 19:30:07 | Re: Emit fewer vacuum records by reaping removable tuples during pruning |