Re: Streaming Replication Error

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: soumik(dot)bhattacharjee(at)kpn(dot)com
Cc: pgsql-admin(at)lists(dot)postgresql(dot)org, pgsql-admin <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Streaming Replication Error
Date: 2019-12-20 20:38:17
Message-ID: CAMkU=1yxVevNqs6rvWAyt5Le+OVM8ZdWHJ0MWKguMmrvnYgpLg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

On Fri, Dec 20, 2019 at 2:12 PM <soumik(dot)bhattacharjee(at)kpn(dot)com> wrote:

> - Did a *pg_restore* on MASTER (The existing instance - MASTER)
>
> - The SLAVE has all the data after the restore was done in
> MASTER – both were in sync.
>
How did you determine that it had all the data? I suspect that they fell
out of sync towards the end of the pg_restore, and your method just
couldn't detect this fact. At least, I can't think of anything which would
cause them to lose sync exactly at the end of pg_restore. Maybe it was
just that the first checkpoint after pg_restore was finished caused the
necessary WAL files to be recycled. It could have just as easily been a
checkpoint running during the pg_restore which caused the problem, but by
luck it was not.

Please let me know if I did something wrong above step highlighted ?
>

I don't think you did anything objectively wrong. You could argue that
using wal_keep_segments rather than a replication slot was wrong, or you
could say it was not wrong but just a calculated risk. In this case, it
seems the risk was realized. Using a replication slot would also be a
risk, the risk in that case being that the streaming to replica can't keep
up, and so pg_wal fills up to capacity and crashes the master. You have to
decide what risk you would rather take.

> Does that mean I cannot refresh the MASTER anytime which should replicate
> to SLAVE?
>

Usually the master is your production server. Why would you be refreshing
it? Where would you be refreshing it from? What other server exists that
contains a higher level of truth than what your master production server
already has?

You can use a replication slot, you can increase wal_keep_segments to a
larger value (although there is no way to know with certainty ahead of time
what value will be large enough), or you can just deal with the risk that
your replica may occasionally lose sync and need to be recreated. You
might also be able to change the topology so your current replica and
current master both stream from the higher-source-of-truth server, rather
than cascading changes, first logically and then physically. There is no
correct answer, you have to understand and weigh the balance of risks for
yourself.

Cheers,

Jeff

>

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message soumik.bhattacharjee 2019-12-20 20:45:17 RE: Streaming Replication Error
Previous Message soumik.bhattacharjee 2019-12-20 19:12:42 RE: Streaming Replication Error