From: | "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> |
---|---|
To: | 'Bertrand Drouvot' <bertranddrouvot(dot)pg(at)gmail(dot)com> |
Cc: | 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | RE: Fix 035_standby_logical_decoding.pl race conditions |
Date: | 2025-04-07 06:15:13 |
Message-ID: | OSCPR01MB14966A5BBB6A16357B1D49D9CF5AA2@OSCPR01MB14966.jpnprd01.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Dear Bertrand,
> I wonder if we could not keep this test and make the slot active for the
> vacuum full case. Looking at drongo's failure in [1], there is no occurence
> of "vacuum full" and that's probably linked to Andres's explanation in [2]:
>
> "
> a VACUUM FULL on pg_class is
> used, which prevents logical decoding from progressing after it started (due
> to the logged AEL at the start of VACFULL).
> "
>
I had been debugging and found the case that VACUUM FULL also has a timing issue.
This means the we cannot keep the testcase.
PSA the reproducer for PG17. IIUC this can happen even in PG16.
I considered what happened here;
1. Run a CHECKPOINT and wait sometime in wait_until_vacuum_can_remove().
This ensures that RUNNING_XACTS record can be generated and catalog_xmin can
be advanced after the user SQLs.
2. Assuming that another RUNNING_XACTS record is generated *WHILE* doing a VACUUM
FULL. This can be done by the periodic checkpoint or the reproducer.
3. Logical walsender detects the RUNNING_XACTS record.
Note that this must be done before startup tries to invalidate slot.
4. In sometime the walsender receives the ack and advance the catalog_xmin.
Note again that this must be done before startup tries to invalidate slot.
5. Startup process detects the PRUNE_ON_ACCESS record and tries to invalidate the
slot. However, the catalog_xmin has been advanced so that the invalidation
cannot be done.
Analysis
========
While analyzing this workload, I found that VACUUM FULL can generate four
PRUNE_ON_ACCESS records. More especially, first two records are generated while
clustering the table, others are done while updating pg_database.datfrozenxid.
Interestingly, latter records are genareted after the transaction is finished;
the VACUUM FULL command itselfs ends up the txn once (in vacuum_rel) and then
continue working on. Without the delay in testcode, the first PRUNE record leads
the invalidation the slot, and with the delay fourth PRUNE leads it. Per my
analysis, snapshotConflictHorizon is the xid which first PRUNE records exist.
Based on the fact, I considered that catalog_xmin can be advanced till the between
(non-)transactional PRUNE records. RequestCheckpoint() is added to generate the
RUNNING_XACTS in-between them.
Very thanks Amit for supporting me off-list for reproducing the issue.
Best regards,
Hayato Kuroda
Fujitsu LIMITED
Attachment | Content-Type | Size |
---|---|---|
repro_pg17.diffs | application/octet-stream | 2.2 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Ashutosh Bapat | 2025-04-07 06:20:46 | Re: Changing shared_buffers without restart |
Previous Message | Kyotaro Horiguchi | 2025-04-07 06:13:59 | Correct mismatched verb in a message |