Re: Rsync to a recovering streaming replica?

From: Igor Polishchuk <ora4dba(at)gmail(dot)com>
To: Scott Mead <scottm(at)openscg(dot)com>
Cc: PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Rsync to a recovering streaming replica?
Date: 2017-09-27 20:08:36
Message-ID: 1754AB6A-A61A-410E-A93B-0AE06958D59D@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Scott,
Thank you for your insight. I do have some extra disk and network throughput to spare. However, my question is ‘Can I run rsync while streaming is running?’
A streaming replica is a physical copy of a master, so why not. My concern is a possible silent introduction of some block corruptions, that would not be fixed by a block copy in wal files. I think such corruptions should not happen, and I saw a few instances where running rsync seemed to work.
I’m curious if somebody is aware about a situation where a corruption is likely to happen.

Igor

> On Sep 27, 2017, at 12:48, Scott Mead <scottm(at)openscg(dot)com> wrote:
>
>
>
> On Wed, Sep 27, 2017 at 1:59 PM, Igor Polishchuk <ora4dba(at)gmail(dot)com <mailto:ora4dba(at)gmail(dot)com>> wrote:
> Sorry, here are the missing details, if it helps:
> Postgres 9.6.5 on CentOS 7.2.1511
>
> > On Sep 27, 2017, at 10:56, Igor Polishchuk <ora4dba(at)gmail(dot)com <mailto:ora4dba(at)gmail(dot)com>> wrote:
> >
> > Hello,
> > I have a multi-terabyte streaming replica on a bysy database. When I set it up, repetative rsyncs take at least 6 hours each.
> > So, when I start the replica, it begins streaming, but it is many hours behind right from the start. It is working for hours, and cannot reach a consistent state
> > so the database is not getting opened for queries. I have plenty of WAL files available in the master’s pg_xlog, so the replica never uses archived logs.
> > A question:
> > Should I be able to run one more rsync from the master to my replica while it is streaming?
> > The idea is to overcome the throughput limit imposed by a single recovery process on the replica and allow to catch up quicker.
> > I remember doing it many years ago on Pg 8.4, and also heard from other people doing it. In all cases, it seamed working.
> > I’m just not sure if there is no high risk of introducing some hidden data corruption, which I may not notice for a while on such a huge database.
> > Any educated opinions on the subject here?
>
> It really comes down to the amount of I/O (network and disk) your system can handle while under load. I've used 2 methods to do this in the past:
>
> - http://moo.nac.uci.edu/~hjm/parsync/ <http://moo.nac.uci.edu/~hjm/parsync/>
>
> parsync (parallel rsync)is nice, it does all the hard work for you of parellizing rsync. It's just a pain to get all the prereqs installed.
>
>
> - rsync --itemize-changes
> Essentially, use this to get a list of files, manually split them out and fire up a number of rsyncs. parsync does this for you, but, if you can't get it going for any reason, this works.
>
>
> The real trick, after you do your parallel rsync, make sure that you run one final rsync to sync-up any missed items.
>
> Remember, it's all about I/O. The more parallel threads you use, the harder you'll beat up the disks / network on the master, which could impact production.
>
> Good luck
>
> --Scott
>
>
>
>
>
>
> >
> > Thank you
> > Igor Polishchuk
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org <mailto:pgsql-general(at)postgresql(dot)org>)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general <http://www.postgresql.org/mailpref/pgsql-general>
>
>
>
> --
> --
> Scott Mead
> Sr. Architect
> OpenSCG <http://openscg.com/>
> http://openscg.com <http://openscg.com/>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Scott Mead 2017-09-27 20:10:45 Re: Rsync to a recovering streaming replica?
Previous Message David G. Johnston 2017-09-27 19:55:03 Re: pg_upgrade?: Upgrade method from/to any version on random OS?