From: | Lars Vonk <lars(dot)vonk(at)gmail(dot)com> |
---|---|
To: | Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Missing rows after migrating from postgres 11 to 12 with logical replication |
Date: | 2020-12-23 09:40:31 |
Message-ID: | CAMX1ThheKfA5oOw9_3WQx+MxHy0ti6La6cwrijh6YzG-rWZ4zA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
The full setup is:
**Before:
11 primary -> 11 hotstandby binary
**During migration
11 primary -> 11 hotstandby binary
| -> 12 new instance via logical
|-> 12 new replica via binary
**After migration
12 primary
|-> 12 replica via binary
On Tue, Dec 22, 2020 at 7:16 PM Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
wrote:
> On 12/22/20 9:10 AM, Lars Vonk wrote:
> > Did you have some other replication running on the 11 instance?
> >
> >
> > Yes the 11 instance also had another (11) replica running. (But these
> > logs are from the 12 instance)
>
> The 11 instance had the data that went missing in the 12 instance, so
> what shows up in logs for the 11 instance during this period that is
> relevant?
>
> >
> > The new 12 instance also had a replica running.
>
> So the setup was?:
>
> 1) 11 primary --> 11 standby via what replication logical or binary?
> | --> 12 new instance via logical
>
> 2) 12(new) primary --> 12(?) standby via what replication logical or
> binary?
>
> >
> > In any case what was the command logged just before the ERROR.
> >
> >
> > There is nothing logged.
> >
> > These are the only log statements just before the error message, one
> > second later the ERROR is logged:
> >
> > 2020-12-10 13:26:43 UTC::@:[5537]:LOG: checkpoints are occurring too
> > frequently (20 seconds apart)
> > 2020-12-10 13:26:43 UTC::@:[5537]:HINT: Consider increasing the
> > configuration parameter "max_wal_size".
> > 2020-12-10 13:26:43 UTC::@:[5537]:LOG: checkpoint starting: wal
> >
> > Lars
> >
> > On Mon, Dec 21, 2020 at 11:51 PM Adrian Klaver
> > <adrian(dot)klaver(at)aklaver(dot)com <mailto:adrian(dot)klaver(at)aklaver(dot)com>> wrote:
> >
> > On 12/21/20 2:42 PM, Lars Vonk wrote:
> > > What was being run when the above ERROR was triggered?
> > >
> > >
> > > The initial copy of a table. Other than that we ran select
> > > pg_size_pretty(pg_relation_size('table_name')) to see the current
> > size
> > > of the table being copied to get a feeling on progress.
> > >
> > > And whenever we added a new table to the publication we ran ALTER
> > > SUBSCRIPTION migration REFRESH PUBLICATION; to add any new table
> > to the
> > > subscription. But not around that timestamp, about 50 minutes
> > before the
> > > first occurence of that ERROR. (no ERRORS after prior ALTER
> > SUBSCRIPTIONs).
> > >
> > > But after the initial copy's ended there are more ERROR's on
> > different
> > > WAL segments missing. Each missing wal segment is logged as ERROR
> a
> > > couple of times and then no more. After a couple of hours no
> > errors are
> > > logged.
> >
> > Something was looking for the WAL segment.
> >
> > Did you have some other replication running on the 11 instance?
> >
> > In any case what was the command logged just before the ERROR.
> >
> > >
> > > Lars
> > >
> >
> >
> > --
> > Adrian Klaver
> > adrian(dot)klaver(at)aklaver(dot)com <mailto:adrian(dot)klaver(at)aklaver(dot)com>
> >
>
>
> --
> Adrian Klaver
> adrian(dot)klaver(at)aklaver(dot)com
>
From | Date | Subject | |
---|---|---|---|
Next Message | Gustavsson Mikael | 2020-12-23 09:50:25 | SV: SV: SV: SV: Problem with ssl and psql in Postgresql 13 |
Previous Message | Laurenz Albe | 2020-12-23 07:21:54 | Re: Information schema sql_identifier |