Re: Dealing with latency to replication slave; what to do?

From: Rory Falloon <rfalloon(at)gmail(dot)com>
To: andres(at)anarazel(dot)de
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Dealing with latency to replication slave; what to do?
Date: 2018-07-24 20:08:38
Message-ID: CANP_6+NRfinHdxixJz3YoxAgc6oTk8=OdCGce_m-EjF8=emH+Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Andres,

regarding your first reply, I was inferring that from the fact I saw those
messages at the same time the replication stream fell behind. What other
logs would be more pertinent to this situation?

On Tue, Jul 24, 2018 at 4:02 PM Andres Freund <andres(at)anarazel(dot)de> wrote:

> Hi,
>
> On 2018-07-24 15:39:32 -0400, Rory Falloon wrote:
> > Looking for any tips here on how to best maintain a replication slave
> which
> > is operating under some latency between networks - around 230ms. On a
> good
> > day/week, replication will keep up for a number of days, but however,
> when
> > the link is under higher than average usage, keeping replication active
> can
> > last merely minutes before falling behind again.
> >
> > 2018-07-24 18:46:14 GMTLOG: database system is ready to accept read only
> > connections
> > 2018-07-24 18:46:15 GMTLOG: started streaming WAL from primary at
> > 2B/93000000 on timeline 1
> > 2018-07-24 18:59:28 GMTLOG: incomplete startup packet
> > 2018-07-24 19:15:36 GMTLOG: incomplete startup packet
> > 2018-07-24 19:15:36 GMTLOG: incomplete startup packet
> > 2018-07-24 19:15:37 GMTLOG: incomplete startup packet
> >
> > As you can see above, it lasted about half an hour before falling out of
> > sync.
>
> How can we see that from the above? The "incomplete startup messages"
> are independent of streaming rep? I think you need to show us more logs.
>
>
> > On the master, I have wal_keep_segments=128. What is happening when I see
> > "incomplete startup packet" - is it simply the slave has fallen behind,
> > and cannot 'catch up' using the wal segments quick enough? I assume the
> > slave is using the wal segments to replay changes and assuming there are
> > enough wal segments to cover the period it cannot stream properly, it
> will
> > eventually recover?
>
> You might want to look into replication slots to ensure the primary
> keeps the necessary segments, but not more, around. You might also want
> to look at wal_compression, to reduce the bandwidth usage.
>
> Greetings,
>
> Andres Freund
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Christophe Pettus 2018-07-24 20:09:29 Re: width_bucket issue
Previous Message Raphaël Berbain 2018-07-24 20:02:41 width_bucket issue