From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
Cc: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Build-farm - intermittent error in 031_column_list.pl |
Date: | 2022-05-20 03:58:29 |
Message-ID: | CAA4eK1+oWn863hwr=MYXeMT4pDBpU0BXoGVt2VxnKyt7QEj+Hg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, May 20, 2022 at 6:58 AM Kyotaro Horiguchi
<horikyota(dot)ntt(at)gmail(dot)com> wrote:
>
> > > the apply worker will use the existing slot and replication origin
> > > corresponding to the subscription. Now, it is possible that before
> > > restart the origin has not been updated and the WAL start location
> > > points to a location prior to where PUBLICATION pub9 exists which can
> > > lead to such an error. Once this error occurs, apply worker will never
> > > be able to proceed and will always return the same error. Does this
> > > make sense?
>
> Wow. I didin't thought that line. That theory explains the silence and
> makes sense even though I don't see LSN transistions that clearly
> support it. I dimly remember a similar kind of problem..
>
> > > Unless you or others see a different theory, this seems to be the
> > > existing problem in logical replication which is manifested by this
> > > test. If we just want to fix these test failures, we can create a new
> > > subscription instead of altering the existing publication to point to
> > > the new publication.
> > >
> >
> > If the above theory is correct then I think allowing the publisher to
> > catch up with "$node_publisher->wait_for_catchup('sub1');" before
> > ALTER SUBSCRIPTION should fix this problem. Because if before ALTER
> > both publisher and subscriber are in sync then the new publication
> > should be visible to WALSender.
>
> It looks right to me.
>
Let's wait for Tomas or others working in this area to share their thoughts.
> That timetravel seems inintuitive but it's the
> (current) way it works.
>
I have thought about it but couldn't come up with a good way to change
the way currently it works. Moreover, I think it is easy to hit this
in other ways as well. Say, you first create a subscription with a
non-existent publication and then do operation on any unrelated table
on the publisher before creating the required publication, we will hit
exactly this problem of "publication does not exist", so I think we
may need to live with this behavior and write tests carefully.
--
With Regards,
Amit Kapila.
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2022-05-20 03:59:28 | Re: 15beta1 test failure on mips in isolation/expected/stats |
Previous Message | Tom Lane | 2022-05-20 03:58:24 | Re: 15beta1 test failure on mips in isolation/expected/stats |