On Monday, October 7, 2024 1:57 PM Zhijie Hou (Fujitsu) <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Saturday, October 5, 2024 6:12 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
>
> Sending it again to avoid getting it stuck because the original was replied to
> both -hackers and -bugs.
>
> >
> > On Tue, Oct 1, 2024 at 7:07 PM Takeshi Ideriha
> > <iderihatakeshi(at)gmail(dot)com>
> > wrote:
> > >
> > > >We forgot to set/unset the flag in functions
> > > >systable_beginscan_ordered and systable_endscan_ordered. BTW,
> > >
> > > Thank you for the clarification.
> > >
> > > >shouldn't this occur even without prepare transaction? If so, we
> > > >need to backpatch this till 14.
> > >
> > > Yes, it occurred also at PG14.
> > > I did some tests using 015_stream.pl with some modification like
> > > below, which tests the subscription about stream is on but two-phase
> > > is off.
> > > The same issue occurred at both current head source code and PG14.
> > >
> > > ```
> > > --- a/src/test/subscription/t/015_stream.pl
> > > +++ b/src/test/subscription/t/015_stream.pl
> > > @@ -134,9 +134,11 @@ my $node_subscriber =
> > > PostgreSQL::Test::Cluster->new('subscriber');
> > > $node_subscriber->init;
> > > $node_subscriber->start;
> > >
> > > +my $default = join('', map {chr(65 + rand 26)} (1 .. 10000));
> > > +
> > > # Create some preexisting content on publisher
> > > $node_publisher->safe_psql('postgres',
> > > - "CREATE TABLE test_tab (a int primary key, b bytea)");
> > > + "CREATE TABLE test_tab (a int primary key, b bytea, t text
> > > DEFAULT '$default')");
> > > $node_publisher->safe_psql('postgres',
> > > "INSERT INTO test_tab VALUES (1, 'foo'), (2, 'bar')");
> > >
> > > @@ -144,7 +146,7 @@ $node_publisher->safe_psql('postgres', "CREATE
> > > TABLE test_tab_2 (a int)");
> > >
> > > # Setup structure on subscriber
> > > $node_subscriber->safe_psql('postgres',
> > > - "CREATE TABLE test_tab (a int primary key, b bytea, c
> > > timestamptz DEFAULT now(), d bigint DEFAULT 999)"
> > > + "CREATE TABLE test_tab (a int primary key, b bytea, t text
> > > DEFAULT '$default', c timestamptz DEFAULT now(), d bigint DEFAULT
> > > 999)"
> > >
> > > >Also, it is better to have a test for this, and let's ensure that
> > > >the new test doesn't increase the regression time too much if possible.
> > >
> > > Sure. I'm going to add some test codes in a few days.
> > >
> >
> > The above test shown in the email will work to test this issue.
> > However, we should use 'debug_logical_replication_streaming' as that
> > will help to reproduce the issue with just one write operation. Is it
> > possible to add the test in 018_stream_subxact_abort where we already use
> it?
>
> I think it might be better to put the test in test_decoding/sql/stream.sql as the
> bug is only related to the decoding part. Writing it in sql test also seems
> cheaper.
>
> Just share a top-up patch 0002 which includes the tests I suggested for
> reference. The test would fail without the fix.
>
> Node that I didn't change any logic in 0001 as the fix looks good to me. I just ran
> the pgindent for the 0001.
While testing the fix on back-branches, I realized that we don't have the
debug_logical_replication_streaming GUC on PG14 and 15. So I tried to generate
enough data to exceed the logical_decoding_work_mem in the test instead. Please
see the attachment for reference. I also rebased 0001 on back-branches, so just
share them here as well.
Best Regards,
Hou zj