RE: BUG #18641: Logical decoding of two-phase commit fails with TOASTed default values

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Takeshi Ideriha <iderihatakeshi(at)gmail(dot)com>
Cc: "exclusion(at)gmail(dot)com" <exclusion(at)gmail(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: RE: BUG #18641: Logical decoding of two-phase commit fails with TOASTed default values
Date: 2024-10-07 08:36:38
Message-ID: OS0PR01MB5716C29EFB5576B84DBA7A2A947D2@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Monday, October 7, 2024 1:57 PM Zhijie Hou (Fujitsu) <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Saturday, October 5, 2024 6:12 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
>
> Sending it again to avoid getting it stuck because the original was replied to
> both -hackers and -bugs.
>
> >
> > On Tue, Oct 1, 2024 at 7:07 PM Takeshi Ideriha
> > <iderihatakeshi(at)gmail(dot)com>
> > wrote:
> > >
> > > >We forgot to set/unset the flag in functions
> > > >systable_beginscan_ordered and systable_endscan_ordered. BTW,
> > >
> > > Thank you for the clarification.
> > >
> > > >shouldn't this occur even without prepare transaction? If so, we
> > > >need to backpatch this till 14.
> > >
> > > Yes, it occurred also at PG14.
> > > I did some tests using 015_stream.pl with some modification like
> > > below, which tests the subscription about stream is on but two-phase
> > > is off.
> > > The same issue occurred at both current head source code and PG14.
> > >
> > > ```
> > > --- a/src/test/subscription/t/015_stream.pl
> > > +++ b/src/test/subscription/t/015_stream.pl
> > > @@ -134,9 +134,11 @@ my $node_subscriber =
> > > PostgreSQL::Test::Cluster->new('subscriber');
> > > $node_subscriber->init;
> > > $node_subscriber->start;
> > >
> > > +my $default = join('', map {chr(65 + rand 26)} (1 .. 10000));
> > > +
> > > # Create some preexisting content on publisher
> > > $node_publisher->safe_psql('postgres',
> > > - "CREATE TABLE test_tab (a int primary key, b bytea)");
> > > + "CREATE TABLE test_tab (a int primary key, b bytea, t text
> > > DEFAULT '$default')");
> > > $node_publisher->safe_psql('postgres',
> > > "INSERT INTO test_tab VALUES (1, 'foo'), (2, 'bar')");
> > >
> > > @@ -144,7 +146,7 @@ $node_publisher->safe_psql('postgres', "CREATE
> > > TABLE test_tab_2 (a int)");
> > >
> > > # Setup structure on subscriber
> > > $node_subscriber->safe_psql('postgres',
> > > - "CREATE TABLE test_tab (a int primary key, b bytea, c
> > > timestamptz DEFAULT now(), d bigint DEFAULT 999)"
> > > + "CREATE TABLE test_tab (a int primary key, b bytea, t text
> > > DEFAULT '$default', c timestamptz DEFAULT now(), d bigint DEFAULT
> > > 999)"
> > >
> > > >Also, it is better to have a test for this, and let's ensure that
> > > >the new test doesn't increase the regression time too much if possible.
> > >
> > > Sure. I'm going to add some test codes in a few days.
> > >
> >
> > The above test shown in the email will work to test this issue.
> > However, we should use 'debug_logical_replication_streaming' as that
> > will help to reproduce the issue with just one write operation. Is it
> > possible to add the test in 018_stream_subxact_abort where we already use
> it?
>
> I think it might be better to put the test in test_decoding/sql/stream.sql as the
> bug is only related to the decoding part. Writing it in sql test also seems
> cheaper.
>
> Just share a top-up patch 0002 which includes the tests I suggested for
> reference. The test would fail without the fix.
>
> Node that I didn't change any logic in 0001 as the fix looks good to me. I just ran
> the pgindent for the 0001.

While testing the fix on back-branches, I realized that we don't have the
debug_logical_replication_streaming GUC on PG14 and 15. So I tried to generate
enough data to exceed the logical_decoding_work_mem in the test instead. Please
see the attachment for reference. I also rebased 0001 on back-branches, so just
share them here as well.

Best Regards,
Hou zj

Attachment Content-Type Size
v3-0001-Set-and-unset-bsysscan-flag-in-systable_-_ordered-HEAD.patch application/octet-stream 1.5 KB
v3-0001-Set-and-unset-bsysscan-flag-in-systable_-_ordered-PG16.patch application/octet-stream 1.5 KB
v3-0002-add-a-testcase-in-stream.sql-PG16.patch application/octet-stream 2.9 KB
v3-0002-add-a-testcase-in-stream.sql-PG14-PG15.patch application/octet-stream 2.8 KB
v3-0001-Set-and-unset-bsysscan-flag-in-systable_-_ordered-PG14-PG15.patch application/octet-stream 1.5 KB
v3-0001-Set-and-unset-bsysscan-flag-in-systable_-_ordered-PG17.patch application/octet-stream 1.5 KB
v3-0002-add-a-testcase-in-stream.sql-HEAD.patch application/octet-stream 2.8 KB
v3-0002-add-a-testcase-in-stream.sql-PG17.patch application/octet-stream 2.8 KB

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Craig Milhiser 2024-10-07 11:42:14 Re: Reference to - BUG #18349: ERROR: invalid DSA memory alloc request size 1811939328, CONTEXT: parallel worker
Previous Message Zhijie Hou (Fujitsu) 2024-10-07 05:57:26 RE: BUG #18641: Logical decoding of two-phase commit fails with TOASTed default values

Browse pgsql-hackers by date

  From Date Subject
Next Message Nitin Jadhav 2024-10-07 08:36:54 Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
Previous Message Guillaume Lelarge 2024-10-07 08:19:04 Re: Parallel workers stats in pg_stat_database