Re: Feedback about hybrid SAN snap and rsync'd approach for large systemcloning

From: Jerry Sievers <gsievers19(at)comcast(dot)net>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: pggeneral <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Feedback about hybrid SAN snap and rsync'd approach for large systemcloning
Date: 2022-04-19 02:35:00
Message-ID: A1B67D24-5721-4597-8F54-9217C77028A0@comcast.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Stephen, and thanks! Please see below...

> Greetings,

> * Jerry Sievers (gsievers19(at)comcast(dot)net) wrote:
> Suppose we have a DB cluster with an additional tablespace and we are
> able to make an atomic SAN snapshot of *only* the main cluster
> volume...

> The additional tablespace contains only UNLOGGED relations.

> We cannot snap that volume so we use rsync as follows...

> 1. pg_start_backup('foo');
> make SAN snapshot
> rsync the add'l tablespace
> pg_stop_backup()

> Surely you're also doing WAL archiving? You shouldn't ever be using
> start/stop backup without also doing WAL archiving as you must capture
> all of the WAL between the two.

Oh, we're doing archiving all right, via pg_receivewal.

And please stop calling me Shirley :-)

> Now provision a new cluster around the snapshot and rsync'd volume,
> rejigger the pg_tblspc link if necessary... and start it up maybe or
> not having it remain as a streaming replica.

> Dedicated backup tools know how to do tablespace remapping.

Yep, I'm aware. The fixing of symlinks isn't much of a concern here,
generally.

> It's been my experience that possibly bulky data in the additional
> tablespace does *not* need be rsync'd if we capture only the *_init
> files.

> That isn't a trivial thing to do with rsync tho and "try it and see if
> it works" is a really bad way to go about this- it's important to
> understand what the files are in the data directory, how PG uses them,
> how PG recovery works, etc. That's why there are folks who write
> dedicated backup tools for PG (myself included).

Understood. It's a legacy configuration that we're working with
that's used both for making full backups as well as frequently
refreshed non-prod systems. Some of them are large enough to make
compelling avoidance of any unnecessary data materialization.

> Id' be curious to here feedback re the sanity of this approach.

> I wouldn't recommend it is the short answer. Note that you must be
> doing WAL archiving and you really should (and will have to, as of 15)

Hmmm, we're using non-exclusive just as of my most recent
updates... but I wasn't aware that exclusive mode was going away
entirely w/V15.

> use non-exclusive backup when you're doing this and be sure to copy the
> backup_label file from the pg_stop_backup/pg_backup_stop (in v15)
> results and put that into the snapshot. If you use exclusive mode
> then if the system crashes for any reason during the rsync then you'll
> have a good chance of ending up with a system that won't come back up
> until you go muck around in the data directory. It's not good and is
> why it's now been removed.

Agree and I've certainly had a few cases of that in the past. Agree
that it'll foil the less seasoned DBA.

Yep and we do stash the labelfile output from pg_stop_backup in the
snap after it's mounted on whatever box.

> Also, you'd have to craft a pretty ugly rsync to make it not copy data
> from the unlogged tables, and when you end up with a logged table in
> that tablespace (and you will, unless you're actively monitoring for it
> and remove any that pop up...) you could end up with data corruption.

We are... but indeed the rsync include/exclude rules were fussy to
develop and IMO non-trivial to really verify, certainly not ideal.

For that matter, the event trigger that disallows creating logged
tables I'm not certain is foolproof.

> Don't think that you'll be able to use delta rsyncs with this either, as
> you may end up with files being changed without rsync realizing it and
> again end up with corruption.

We don't do that anyhow.

The snaps I'm talking about here are instantiated only once, then
written to tape or brought up as non-prod systems... then torn all the
way down.

> And would also like to know if perhaps *only* the directories under
> the rsync'd tablespace actually must be present for a successful
> recovery.

Forget I ever said the above which I'm pretty sure isn't workable and
wouldn't buy much anyhow.

> The above approach has worked numerous times even with origin
> systems having large, churny contents in the dedicated unlogged
> tablespace (which is on a much faster local NVME volume than the
> main SAN volume.)

> Dedicated backup tools already know how to recognize unlogged tables and
> skip them, along with being able to do incremental backup and delta
> restores, I'd strongly suggest you consider using such tools instead of
> trying to hack your own.

Point very well taken and it's about time my site had another look to
see if such a solution is a good fit and /or to reverify that our
backups are consistently recoverable.

Thx again for weighing in.

> Thanks,

> Stephen

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Ram Pratap Maurya 2022-04-19 03:29:49 RE: Huge archive log generate in Postgresql-13
Previous Message Tom Lane 2022-04-18 21:08:14 Re: No psql md5 auth, psql 14.1 to PG 11