Re: [HACKERS] logical decoding of two-phase transactions

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Ajin Cherian <itsajin(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] logical decoding of two-phase transactions
Date: 2021-03-08 14:39:10
Message-ID: CALDaNm0QLfH0FRrhXewD0VaRxPeKOC3Q5jrt7Kjj-BDn58LgvQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 8, 2021 at 6:25 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Mon, Mar 8, 2021 at 4:20 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> >
> > On Mon, Mar 8, 2021 at 11:30 AM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
> > >
> > > On Fri, Mar 5, 2021 at 9:25 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > >
> > >
> > > Created new patch v53:
> >
> > Thanks for the updated patch.
> > I had noticed one issue, publisher does not get stopped normally in
> > the following case:
> > # Publisher steps
> > psql -d postgres -c "CREATE TABLE do_write(id serial primary key);"
> > psql -d postgres -c "INSERT INTO do_write VALUES(generate_series(1,10));"
> > psql -d postgres -c "CREATE PUBLICATION mypub FOR TABLE do_write;"
> >
> > # Subscriber steps
> > psql -d postgres -p 9999 -c "CREATE TABLE do_write(id serial primary key);"
> > psql -d postgres -p 9999 -c "INSERT INTO do_write VALUES(1);" # to
> > cause a PK violation
> > psql -d postgres -p 9999 -c "CREATE SUBSCRIPTION mysub CONNECTION
> > 'host=localhost port=5432 dbname=postgres' PUBLICATION mypub WITH
> > (two_phase = true);"
> >
> > # prepare & commit prepared at publisher
> > psql -d postgres -c \
> > "begin; insert into do_write values (100); prepare transaction 'test1';"
> > psql -d postgres -c "commit prepared 'test1';"
> >
> > Stop publisher:
> > ./pg_ctl -D publisher stop
> > waiting for server to shut
> > down...............................................................
> > failed
> > pg_ctl: server does not shut down
> >
> > This is because the following process does not exit:
> > postgres: walsender vignesh 127.0.0.1(41550) START_REPLICATION
> >
> > It continuously loops at the below:
> >
>
> What happens if you don't set the two_phase option? If that also leads
> to the same error then can you please also check this case on the
> HEAD?

It succeeds without the two_phase option.
I had further analyzed this issue, see the details of it below:
We have the below code in WalSndDone function which will handle the
walsender exit:
if (WalSndCaughtUp && sentPtr == replicatedPtr &&
!pq_is_send_pending())
{
QueryCompletion qc;

/* Inform the standby that XLOG streaming is done */
SetQueryCompletion(&qc, CMDTAG_COPY, 0);
EndCommand(&qc, DestRemote, false);
pq_flush();

proc_exit(0);
}

But in case of with two_phase option, replicatedPtr and sentPtr never
becomes same:
(gdb) p /x replicatedPtr
$8 = 0x15faa70
(gdb) p /x sentPtr
$10 = 0x15fac50

Whereas in case of without two_phase option, replicatedPtr and sentPtr
becomes same and exits:
(gdb) p /x sentPtr
$7 = 0x15fae10
(gdb) p /x replicatedPtr
$8 = 0x15fae10

I think in case of two_phase option, replicatedPtr and sentPtr never
becomes the same which causes this process to hang.

Regards,
Vignesh

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ibrar Ahmed 2021-03-08 14:41:16 Re: Yet another fast GiST build
Previous Message Joe Conway 2021-03-08 14:35:52 Re: [PATCH] pg_permissions