RE: Fix 035_standby_logical_decoding.pl race conditions

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>
Cc: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Fix 035_standby_logical_decoding.pl race conditions
Date: 2025-04-02 07:16:25
Message-ID: OSCPR01MB14966755BC3C534A0058EA07FF5AF2@OSCPR01MB14966.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Amit, Bertrand,

> You have not added any injection point for the above case. Isn't it
> possible that if running_xact record is logged concurrently to the
> pruning record, it should move the active slot on standby, and the
> same failure should occur in this case as well?

I considered that the timing failure can happen. Reproducer:

```
$node_primary->safe_psql('testdb', qq[UPDATE prun SET s = 'D';]);
+$node_primary->safe_psql('testdb', 'CHECKPOINT');
+sleep(20);
$node_primary->safe_psql('testdb', qq[UPDATE prun SET s = 'E';]);
```

And here is my theory...

Firstly, a new table was created with smaller fill factor. Then, after doing UPDATE
three times, the page became full. At fourth UPDATE command (let's say txn4),
the page pruning was done by the backend process and PRUNE_ON_ACCESS was generated.
It requested standbys to discard tuples before the third UPDATE (say txn3),
thus the slot could be invalidated.
However, if a RUNNING_XACTS record is generated between txn3 and txn4, the
oldestRunningXact would be same xid as txn4, and the catalog_xmin of the standby
slot would be advanced till that. Upcoming PRUNE_ON_ACCESS points the txn3 so that
slot invalidation won't happen in this case.

Based on the fact, I've updated to use injection_points for scenario 5. Of course,
PG16/17 patches won't use the active slot for that scenario.

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment Content-Type Size
PG16-v4-0001-Stabilize-035_standby_logical_decoding.pl-by-usin.patch application/octet-stream 15.0 KB
PG17-v4-0001-Stabilize-035_standby_logical_decoding.pl-by-usin.patch application/octet-stream 16.9 KB
v4-0001-Stabilize-035_standby_logical_decoding.pl-by-usin.patch application/octet-stream 8.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2025-04-02 07:30:20 Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning
Previous Message Rushabh Lathia 2025-04-02 07:12:59 Re: Support NOT VALID / VALIDATE constraint options for named NOT NULL constraints