Re: BUG #18641: Logical decoding of two-phase commit fails with TOASTed default values

From: Takeshi Ideriha <iderihatakeshi(at)gmail(dot)com>
To: exclusion(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18641: Logical decoding of two-phase commit fails with TOASTed default values
Date: 2024-09-30 01:16:00
Message-ID: CAJPqx5esD13JsKx-Y_PpvCZNcqTPhsJa061CVz8aU3U0Ue-cdg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Hi, Alexander and hackers

>
> The following bug has been logged on the website:
>
> Bug reference: 18641
> Logged by: Alexander Lakhin
> Email address: exclusion(at)gmail(dot)com
> PostgreSQL version: 17.0
> Operating system: Ubuntu 22.04
> Description:
>
> Logical decoding of a transaction like:
> BEGIN;
> INSERT INTO test_tab VALUES(1);
> PREPARE TRANSACTION 'pt';
>
> where test_tab defined as:
> CREATE TABLE test_tab(a int primary key,
> t text DEFAULT 'Some TOASTable value';
>
> for a subscription created with two_phase = on, fails as below:
> 2024-09-28 06:44:50.708 UTC [3741774:6][client backend][6/2:740] LOG:
> statement: PREPARE TRANSACTION 'pt';
> 2024-09-28 06:44:50.709 UTC [3741774:7][client backend][:0] LOG:
> disconnection: session time: 0:00:00.006 user=law database=postgres
> host=[local]
> 2024-09-28 06:44:50.713 UTC [3741741:17][walsender][25/0:0] ERROR:
> unexpected table_index_fetch_tuple call during logical decoding
> 2024-09-28 06:44:50.713 UTC [3741741:18][walsender][25/0:0] BACKTRACE:
> table_index_fetch_tuple at tableam.h:1253:3
> index_fetch_heap at indexam.c:637:10
> index_getnext_slot at indexam.c:697:6
> systable_getnext_ordered at genam.c:717:5
> heap_fetch_toast_slice at heaptoast.c:698:17
> table_relation_fetch_toast_slice at tableam.h:1924:1
> toast_fetch_datum at detoast.c:379:2
> detoast_attr at detoast.c:123:10
> pg_detoast_datum_packed at fmgr.c:1867:10
> text_to_cstring at varlena.c:220:23
> AttrDefaultFetch at relcache.c:4537:17
> RelationBuildTupleDesc at relcache.c:697:4
> RelationBuildDesc at relcache.c:1188:24
> RelationIdGetRelation at relcache.c:2116:7
> ReorderBufferProcessTXN at reorderbuffer.c:2246:17
> ReorderBufferReplay at reorderbuffer.c:2725:2
> ReorderBufferPrepare at reorderbuffer.c:2822:2
> DecodePrepare at decode.c:826:2
> xact_decode at decode.c:347:5
> LogicalDecodingProcessRecord at decode.c:123:1
> XLogSendLogical at walsender.c:3413:33
> WalSndLoop at walsender.c:2814:4
> StartLogicalReplication at walsender.c:1523:2
> exec_replication_command at walsender.c:2148:5
> PostgresMain at postgres.c:4763:11
> BackendInitialize at backend_startup.c:123:1
> postmaster_child_launch at launch_backend.c:281:9
> BackendStartup at postmaster.c:3593:8
> ServerLoop at postmaster.c:1677:10
> PostmasterMain at postmaster.c:1372:11
> startup_hacks at main.c:217:1
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)
> [0x7f30a90d7e40]
> postgres: node_publisher: walsender law postgres [local]
> START_REPLICATION(_start+0x25) [0x5647b6023565]
> 2024-09-28 06:44:50.713 UTC [3741741:19][walsender][25/0:0] LOG: released
> logical replication slot "test_sub"
> 2024-09-28 06:44:50.713 UTC [3741741:20][walsender][25/0:0] LOG: could not
> send data to client: Broken pipe
> 2024-09-28 06:44:50.713 UTC [3741741:21][walsender][25/0:0] FATAL:
> connection to client lost
>
> The issue can be easily reproduced with 022_twophase_cascade.pl modified
> like this:
> @@ -67,0 +68 @@ $node_A->safe_psql('postgres',
> +my $default = join('', map {chr(65 + rand 26)} (1 .. 10000));
> @@ -69 +70 @@ $node_B->safe_psql('postgres',
> - "CREATE TABLE test_tab (a int primary key, b bytea, c timestamptz
> DEFAULT now(), d bigint DEFAULT 999)"
> + "CREATE TABLE test_tab (a int primary key, b bytea, c timestamptz
> DEFAULT now(), d bigint DEFAULT 999, t text DEFAULT '$default')"
> @@ -72 +73 @@ $node_C->safe_psql('postgres',
> - "CREATE TABLE test_tab (a int primary key, b bytea, c timestamptz
> DEFAULT now(), d bigint DEFAULT 999)"
> + "CREATE TABLE test_tab (a int primary key, b bytea, c timestamptz
> DEFAULT now(), d bigint DEFAULT 999, t text DEFAULT '$default')"
>
> Reproduced on REL_15_STABLE (starting from a8fd13cab) .. master.

Thank you for reporting the issue.
I was able to reproduce the issue by modifying 022_twophase_cascade.pl
accordingly.

The scan for toast index is actually done under systable_getnext_ordered,
where HandleConcurrentAbort() is called. So it seems to me that this
scan is actually safe for concurrent abort in logical decoding.
Logic around HandleConcurrentAbort is intorduced
https://github.com/postgres/postgres/commit/7259736a6e5b7c7588fff9578370736a6648acbb.

Though I may not understand the logic around HandleConcurrentAbort
fully and I am not sure not-setting bsysscan at
systable_beginscan_ordered is intentional,
it seems to me setting and unsetting a bsysscan flag in
systable_beginscan_ordered and systable_endscan_ordered would resolve
the issue.
Patch is attached.

Regards,
Takeshi Ideriha

Attachment Content-Type Size
0001-Set-and-unset-bsysscan-flag-in-systable_-_ordered.patch application/octet-stream 1.5 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Richard Guo 2024-09-30 02:16:00 Re: BUG #18634: Wrong varnullingrels with merge ... when not matched by source
Previous Message Thomas Munro 2024-09-30 01:14:50 Re: Reference to - BUG #18349: ERROR: invalid DSA memory alloc request size 1811939328, CONTEXT: parallel worker

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2024-09-30 01:49:09 Re: information_schema.view attgenerated
Previous Message Michael Paquier 2024-09-30 01:07:55 Re: query_id, pg_stat_activity, extended query protocol