From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | Ajin Cherian <itsajin(at)gmail(dot)com> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Failure of subscription tests with topminnow |
Date: | 2021-08-26 01:01:56 |
Message-ID: | CAD21AoAxtiP6G788h4nG89Y=E4-WbDU3UbMyT0q8TFGBWnW7uw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Aug 25, 2021 at 11:04 PM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
>
> On Wed, Aug 25, 2021 at 11:17 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Wed, Aug 25, 2021 at 6:10 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > I did a quick check with the following tap test code:
> > >
> > > $node_publisher->poll_query_until('postgres',
> > > qq(
> > > select 1 != foo.column1 from (values(0), (1)) as foo;
> > > ));
> > >
> > > The query returns {t, f} but poll_query_until() never finished. The
> > > same is true when the query returns {f, t}.
> > >
>
> Yes, this is true, I also see the same behaviour.
>
> >
> > This means something different is going on in Ajin's setup. Ajin, can
> > you please share how did you confirm your findings about poll_query?
>
> Relooking at my logs, I think what happens is this:
>
> 1. First walsender 'a' is running.
> 2. Second walsender 'b' starts and attempts at acquiring the slot
> finds that the slot is active for pid a.
> 3. Now both walsenders are active, the query does not return.
> 4. First walsender 'a' times out and exits.
> 5. Now only the second walsender is active and the query returns OK
> because pid != a.
> 6. Second walsender exits with error.
> 7. Another query attempts to get the pid of the running walsender for
> tap_sub but returns null because both walsender exits.
> 8. This null return value results in the next query erroring out and
> the test failing.
So this is slightly different than what we can see in the topminnow
logs? According to the server logs, step #5 happened (at 18:44:38.016)
before step #4 happened (at 18:44:38.043).
>
> >Can you additionally check the value of 'state' from
> >pg_stat_replication for both the old and new walsender sessions?
>
> Yes, will try this and post a patch tomorrow.
Thanks. I guess the state of the new walsender should be "startup"
whereas the old one should be "streaming".
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
From | Date | Subject | |
---|---|---|---|
Next Message | Kyotaro Horiguchi | 2021-08-26 01:20:52 | Re: prevent immature WAL streaming |
Previous Message | Kyotaro Horiguchi | 2021-08-26 00:40:09 | Re: prevent immature WAL streaming |