From: | "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> |
---|---|
To: | 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | RE: Fix 035_standby_logical_decoding.pl race conditions |
Date: | 2025-04-02 07:16:25 |
Message-ID: | OSCPR01MB14966755BC3C534A0058EA07FF5AF2@OSCPR01MB14966.jpnprd01.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Dear Amit, Bertrand,
> You have not added any injection point for the above case. Isn't it
> possible that if running_xact record is logged concurrently to the
> pruning record, it should move the active slot on standby, and the
> same failure should occur in this case as well?
I considered that the timing failure can happen. Reproducer:
```
$node_primary->safe_psql('testdb', qq[UPDATE prun SET s = 'D';]);
+$node_primary->safe_psql('testdb', 'CHECKPOINT');
+sleep(20);
$node_primary->safe_psql('testdb', qq[UPDATE prun SET s = 'E';]);
```
And here is my theory...
Firstly, a new table was created with smaller fill factor. Then, after doing UPDATE
three times, the page became full. At fourth UPDATE command (let's say txn4),
the page pruning was done by the backend process and PRUNE_ON_ACCESS was generated.
It requested standbys to discard tuples before the third UPDATE (say txn3),
thus the slot could be invalidated.
However, if a RUNNING_XACTS record is generated between txn3 and txn4, the
oldestRunningXact would be same xid as txn4, and the catalog_xmin of the standby
slot would be advanced till that. Upcoming PRUNE_ON_ACCESS points the txn3 so that
slot invalidation won't happen in this case.
Based on the fact, I've updated to use injection_points for scenario 5. Of course,
PG16/17 patches won't use the active slot for that scenario.
Best regards,
Hayato Kuroda
FUJITSU LIMITED
Attachment | Content-Type | Size |
---|---|---|
PG16-v4-0001-Stabilize-035_standby_logical_decoding.pl-by-usin.patch | application/octet-stream | 15.0 KB |
PG17-v4-0001-Stabilize-035_standby_logical_decoding.pl-by-usin.patch | application/octet-stream | 16.9 KB |
v4-0001-Stabilize-035_standby_logical_decoding.pl-by-usin.patch | application/octet-stream | 8.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Langote | 2025-04-02 07:30:20 | Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning |
Previous Message | Rushabh Lathia | 2025-04-02 07:12:59 | Re: Support NOT VALID / VALIDATE constraint options for named NOT NULL constraints |