From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | Ajin Cherian <itsajin(at)gmail(dot)com> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Failure of subscription tests with topminnow |
Date: | 2021-08-25 11:32:01 |
Message-ID: | CAD21AoCWDKJ1NzZ=WCKVw59AUjnkbdTYZx_OpirQ9zDjYF4mxw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Aug 25, 2021 at 6:53 PM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
>
> On Wed, Aug 25, 2021 at 5:43 PM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
> >
> > On Wed, Aug 25, 2021 at 4:22 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Wed, Aug 25, 2021 at 8:00 AM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
> > > >
> > > > On Tue, Aug 24, 2021 at 11:12 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > >
> > > > > But will poll function still poll or exit? Have you tried that?
> > > >
> > > > I have forced that condition with a changed query and found that the
> > > > poll will not exit in case of a NULL return.
> > > >
> > >
> > > What if the query in a poll is fired just before we get an error
> > > "tap_sub ERROR: replication slot "tap_sub" is active for PID 16336"?
> > > Won't at that stage both old and new walsender's are present, so the
> > > query might return true. You can check that via debugger by stopping
> > > just before this error occurs and then check pg_stat_replication view.
> >
> > If this error happens then the PID is NOT updated as the pid in the
> > Replication slot. I have checked this
> > and explained this in my first email itself
> >
>
> Sorry about the above email, I misunderstood. I was looking at
> pg_stat_replication_slot rather than pg_stat_replication hence the confusion.
> Amit is correct, just prior to the walsender erroring out, it briefly
> appears in the
> pg_stat_replication, and that is why this error happens. Sorry for the
> confusion.
> I just confirmed it, got both the walsenders stopped in the debugger:
>
> postgres=# select pid from pg_stat_replication where application_name = 'sub';
> pid
> ------
> 7899
> 7993
> (2 rows)
IIUC the query[1] used for polling returns two rows in this case: {t,
f} or {f, t}. But did poll_query_until() returned OK in this case even
if we expected one row of 't'? My guess of how this issue happened is:
1. the first polling query after "ATLER SUBSCRIPTION CONNECTION"
passed (for some reason).
2. all wal senders exited.
3. get the pid of wal sender with application_name 'tap_sub' but got nothing.
4. the second polling query resulted in a syntax error since $oldpid is null.
If the fact that two walsender with the same application_name could
present in pg_stat_replication view was the cause of this issue,
poll_query_until() should return OK even if we expected just 't'. I
might be missing something, though.
[1] "SELECT pid != $oldpid FROM pg_stat_replication WHERE
application_name = '$appname';"
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
From | Date | Subject | |
---|---|---|---|
Next Message | Jakub Wartak | 2021-08-25 11:59:45 | RE: prevent immature WAL streaming |
Previous Message | Dipesh Pandit | 2021-08-25 11:11:03 | Re: .ready and .done files considered harmful |