BUG #18280: logical decoding build wrong snapshot for subtransactions

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: feichanghong(at)qq(dot)com
Subject: BUG #18280: logical decoding build wrong snapshot for subtransactions
Date: 2024-01-10 12:51:08
Message-ID: 18280-4c8060178cb41750@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 18280
Logged by: Fei Changhong
Email address: feichanghong(at)qq(dot)com
PostgreSQL version: 15.5
Operating system: Operating system: centos 7,Kernel version: 5.10.13
Description:

Hi, all
I encountered a problem related to logical decoding history snapshot. The
specific phenomenon is "ERROR: could not map filenode "base/5/16390" to
relation OID".
The sub-transaction that modified the catalog ends earlier than the
restart_lsn of the logical replication slot, but the commit wal record of
its parent transaction is after the restart_lsn. The WAL record related to
the sub-transaction will not be decoded during logical decoding, so it will
not be marked as containing catalog changes. The catalog is not recorded in
the committed list of the snapshot.
SnapBuildXidSetCatalogChanges (introduced in 272248a) skipping the check for
the sub-transactions when the parent transaction has been marked as
containing catalog changes should be the root cause of the problem.

The problem can be reproduced by following the following steps (to avoid the
impact of bgwriter writing XLOG_RUNNING_XACTS WAL records, I increased the
value of LOG_SNAPSHOT_INTERVAL_MS):
session 1:
```
CREATE TABLE tbl1 (val1 integer, val2 integer);
CREATE TABLE tbl1_part (val1 integer) PARTITION BY RANGE (val1);

SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot',
'test_decoding');

BEGIN;
SAVEPOINT sp1;
CREATE TABLE tbl1_part_p1 PARTITION OF tbl1_part FOR VALUES FROM (0) TO
(10);
RELEASE SAVEPOINT sp1;
```

session 2:
```
CHECKPOINT;
```

session 1:
```
CREATE TABLE tbl1_part_p2 PARTITION OF tbl1_part FOR VALUES FROM (10) TO
(20);
COMMIT;
BEGIN;
TRUNCATE tbl1;
```

session 2:
```
CHECKPOINT;
SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL,
'skip-empty-xacts', '1', 'include-xids', '0');
INSERT INTO tbl1_part VALUES (1);
SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL,
'skip-empty-xacts', '1', 'include-xids', '0');
```

I will provide a patch to fix this problem later.

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2024-01-10 13:41:58 BUG #18281: Superuser can rename the schema with the prefix "pg_" (Applies to all versions of postgresql)
Previous Message PG Bug reporting form 2024-01-10 09:12:20 BUG #18279: Duplicate key violation and Deadlock when using ON CONFLICT/DO UPDATE with multiple unique indexes