Re: Missing rows after migrating from postgres 11 to 12 with logical replication

From: Lars Vonk <lars(dot)vonk(at)gmail(dot)com>
To: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Missing rows after migrating from postgres 11 to 12 with logical replication
Date: 2020-12-23 09:40:31
Message-ID: CAMX1ThheKfA5oOw9_3WQx+MxHy0ti6La6cwrijh6YzG-rWZ4zA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

The full setup is:

**Before:
11 primary -> 11 hotstandby binary

**During migration
11 primary -> 11 hotstandby binary
| -> 12 new instance via logical
|-> 12 new replica via binary

**After migration
12 primary
|-> 12 replica via binary

On Tue, Dec 22, 2020 at 7:16 PM Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
wrote:

> On 12/22/20 9:10 AM, Lars Vonk wrote:
> > Did you have some other replication running on the 11 instance?
> >
> >
> > Yes the 11 instance also had another (11) replica running. (But these
> > logs are from the 12 instance)
>
> The 11 instance had the data that went missing in the 12 instance, so
> what shows up in logs for the 11 instance during this period that is
> relevant?
>
> >
> > The new 12 instance also had a replica running.
>
> So the setup was?:
>
> 1) 11 primary --> 11 standby via what replication logical or binary?
> | --> 12 new instance via logical
>
> 2) 12(new) primary --> 12(?) standby via what replication logical or
> binary?
>
> >
> > In any case what was the command logged just before the ERROR.
> >
> >
> > There is nothing logged.
> >
> > These are the only log statements just before the error message, one
> > second later the ERROR is logged:
> >
> > 2020-12-10 13:26:43 UTC::@:[5537]:LOG: checkpoints are occurring too
> > frequently (20 seconds apart)
> > 2020-12-10 13:26:43 UTC::@:[5537]:HINT: Consider increasing the
> > configuration parameter "max_wal_size".
> > 2020-12-10 13:26:43 UTC::@:[5537]:LOG: checkpoint starting: wal
> >
> > Lars
> >
> > On Mon, Dec 21, 2020 at 11:51 PM Adrian Klaver
> > <adrian(dot)klaver(at)aklaver(dot)com <mailto:adrian(dot)klaver(at)aklaver(dot)com>> wrote:
> >
> > On 12/21/20 2:42 PM, Lars Vonk wrote:
> > > What was being run when the above ERROR was triggered?
> > >
> > >
> > > The initial copy of a table. Other than that we ran select
> > > pg_size_pretty(pg_relation_size('table_name')) to see the current
> > size
> > > of the table being copied to get a feeling on progress.
> > >
> > > And whenever we added a new table to the publication we ran ALTER
> > > SUBSCRIPTION migration REFRESH PUBLICATION; to add any new table
> > to the
> > > subscription. But not around that timestamp, about 50 minutes
> > before the
> > > first occurence of that ERROR. (no ERRORS after prior ALTER
> > SUBSCRIPTIONs).
> > >
> > > But after the initial copy's ended there are more ERROR's on
> > different
> > > WAL segments missing. Each missing wal segment is logged as ERROR
> a
> > > couple of times and then no more. After a couple of hours no
> > errors are
> > > logged.
> >
> > Something was looking for the WAL segment.
> >
> > Did you have some other replication running on the 11 instance?
> >
> > In any case what was the command logged just before the ERROR.
> >
> > >
> > > Lars
> > >
> >
> >
> > --
> > Adrian Klaver
> > adrian(dot)klaver(at)aklaver(dot)com <mailto:adrian(dot)klaver(at)aklaver(dot)com>
> >
>
>
> --
> Adrian Klaver
> adrian(dot)klaver(at)aklaver(dot)com
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Gustavsson Mikael 2020-12-23 09:50:25 SV: SV: SV: SV: Problem with ssl and psql in Postgresql 13
Previous Message Laurenz Albe 2020-12-23 07:21:54 Re: Information schema sql_identifier