Re: WAL accumulating, Logical Replication pg 13

From: Willy-Bas Loos <willybas(at)gmail(dot)com>
To: Tomas Pospisek <tpo2(at)sourcepole(dot)ch>
Cc: Vijaykumar Jain <vijaykumarjain(dot)github(at)gmail(dot)com>, pgsql-general <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: WAL accumulating, Logical Replication pg 13
Date: 2021-06-11 07:51:59
Message-ID: CAHnozTgA6oztDwcQsr0nREOBNFKF4f6Gouja7pAYpXn158mC4Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi, I was going to follow up on this one, sorry for the long silence.
The replication is working fine now, and I have no idea what the problem
was. Not cool.
If I find out, I will let you know.

On Mon, May 31, 2021 at 6:06 PM Tomas Pospisek <tpo2(at)sourcepole(dot)ch> wrote:

> Hi Willy-Bas Loos,
>
> On 31.05.21 17:32, Willy-Bas Loos wrote:
> >
> >
> > On Mon, May 31, 2021 at 4:24 PM Vijaykumar Jain
> > <vijaykumarjain(dot)github(at)gmail(dot)com
> > <mailto:vijaykumarjain(dot)github(at)gmail(dot)com>> wrote:
> >
> > So I got it all wrong it seems :)
> >
> > Thank you for taking the time to help me!
> >
> > You upgraded to pg13 fine? , but while on pg13 you have issues with
> > logical replication ?
> >
> > Yes, the upgrade went fine. So here are some details:
> > I already had londiste running on postgres 9.3, but londiste wouldn't
> > run on Debian 10
> > So i first made the new server Debian 9 with postgres 9.6 and i started
> > replicating with londiste from 9.3 to 9.6
> > When all was ready, i stopped the replication to the 9.6 server and
> > deleted all londiste & pgq content with drop schema cascade.
> > Then I upgraded the server to Debian 10. Then i user pg_upgrade to
> > upgrade from postgres 9.6 to 13. (PostGIS versions were kept compatible).
> > Then I added logical replication and a third server as a subscriber.
> >
> > I was going to write that replication is working fine (since the table
> > contains a lot of data and there are no conflicts in the log), but it
> > turns out that it isn't.
> > The subscriber is behind and It looks like there hasn't been any
> > incoming data after the initial data synchronization.
> > So at least now i know that the WAL is being retained with a reason. The
> > connection is working properly (via psql anyway)
>
> I once maybe had a similar problem due to some ports that were needed
> for replication being firewalled off or respectively the master having
> the wrong IP address of the old master (now standby server) or such.
>
> There was absolutely no word anywhere in any log about the problem I was
> just seeing the new postgres master not starting up after hours and
> hours of waiting after a failover. I somehow found out about the
> required port being blocked (I don't remember - maybe seing the
> unanswered SYNs in tcpdump? Or via ufw log entries?).
>
> > I will also look into how to diagnose this from the system tables, e.g.
> > substracting LSN's to get some quantitative measure for the lag.
> >
> >
> >
> > There is a path in the postgresql source user subscription folder
> > iirc which covers various logical replication scenarios.
> > That may help you just in case.
> >
> > OK, so comments in the source code you mean?
> >
>
>

--
Willy-Bas Loos

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Jehan-Guillaume de Rorthais 2021-06-11 08:52:40 Re: How to pass a parameter in a query to postgreSQL 12
Previous Message Adrian Ho 2021-06-11 03:54:37 Re: Even more OT: Ditto machines [was: bottom / top posting]