Re: Difference in the tablespace folders on primary and secondary nodes

From: Abhishek Bhola <abhishek(dot)bhola(at)japannext(dot)co(dot)jp>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Difference in the tablespace folders on primary and secondary nodes
Date: 2023-07-27 01:30:34
Message-ID: CAEDsCzgfGbHAg4e7V8x62oaW2md2npHVDADm79NAPz_SC0jYtA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Stephen

Thank you for the very detailed reply. I tried the above method and it
works. I'm still setting up pgBackRest.

I have one last question if you can answer that too please.

> A delta restore will only restore those
> files on the replica which are different from what was in the backup and
> that can be done using timestamp/file-size because pgBackRest tracks
> that information and will set it on restore.

As per my understanding, primary and standby nodes can have exactly the
same data, with no data corruption, but still have different timestamps on
the data files in the tablespace folder. Please correct me if I am wrong.
If that's the case, then will the backup (incremental/delta) taken from
the primary node on pgBackRest, restore the files on the secondary node,
just because they have a different timestamp? Or does pgBackRest have some
mechanism to detect this and skip those files? Please assume for this case
that we ran out of capacity to store the WAL segments from the primary
while the secondary node was down.

Thanks

On Wed, 26 Jul 2023, 22:55 Stephen Frost, <sfrost(at)snowman(dot)net> wrote:

> Greetings,
>
> * Abhishek Bhola (abhishek(dot)bhola(at)japannext(dot)co(dot)jp) wrote:
> > > Basically, this is not a valid way to perform a backup/restore of PG.
> >
> > Is it not valid only for PG 15 or even for earlier versions? I have
> always
> > referred to this https://www.postgresql.org/docs/8.1/backup-online.html
>
> For earlier versions too.
>
> > > better, use an existing well maintained backup/restore tool for this
> >
> > Is there any tool you could recommend? A tool to bring back the standby
> > node when it was down for a day or so for some OS/firmware upgrade.
> > I have tried pgBackRest, but it requires a control node. So anything that
> > doesn't involve a 3rd server for this situation would be helpful.
>
> pgBackRest doesn't actually require a control node, though it's
> certainly recommended to have one.
>
> Also, if you use physical replication slots, you can have the primary
> hold on to the WAL necessary for the replica to catch back up until the
> replica is back and then it can just fetch the WAL from the primary and
> catch up without any kind of re-sync being necessary.
>
> If you're worried about running out of space on the primary for this
> (which is an entirely valid concern...) then you could ship the WAL
> files to an archive system or location using pgBackRest or possibly some
> other tool and then have the replica fetch the WAL from there once it's
> back online.
>
> pgBackRest also has the ability to do incremental backups and delta
> restores. An incremental backup will only store the data that's changed
> since the prior backup, minimizing the storage space in the repo and the
> time required for the backup. A delta restore will only restore those
> files on the replica which are different from what was in the backup and
> that can be done using timestamp/file-size because pgBackRest tracks
> that information and will set it on restore. pgBackRest does also have
> an option to do checksum-based restores, which it will automatically use
> if anything looks odd regarding the timestamps.
>
> > > this also doesn't grab and restore the absolutely required
> > > backup_label file that's returned from pg_backup_stop()
> >
> > I tried running pg_backup_start('backup') and pg_backup_stop() on my DB,
> > but I did not see any backup_label_file being created.
> > psql (15.1)
> > Type "help" for help.
> >
> > postgres=# select pg_backup_start('backup');
> > pg_backup_start
> > -----------------
> > 68/32000028
> > (1 row)
> >
> > postgres=# select pg_backup_stop();
> > NOTICE: all required WAL segments have been archived
> > pg_backup_stop
> >
> -------------------------------------------------------------------------------
> > (68/32000100,"START WAL LOCATION: 68/32000028 (file
> > 000000010000006800000032)+
> > CHECKPOINT LOCATION: 68/32000060
> > +
> > BACKUP METHOD: streamed
> > +
> > BACKUP FROM: primary
> > +
> > START TIME: 2023-07-26 08:51:28 JST
> > +
> > LABEL: backup
> > +
> > START TIMELINE: 1
> > +
> > ","16724 /PGDATA/datadg/tbs1
> > +
> > 16725 /PGDATA/datadg/tbs2
> > +
> > ")
> > (1 row)
> >
> > I read the documentation on this page
> > https://www.postgresql.org/docs/current/functions-admin.html
>
> What's returned from pg_backup_stop() is the backup_label that needs to
> be stored with the files which were part of the backup. Note that you
> should *not* store the backup_label in the data directory of the primary
> because if the system crashes then it won't come back up without someone
> going in and removing that file. That's how the old exclusive method
> worked which was deprecated and then finally removed because of this
> issue.
>
> > > The desired contents of the backup label file and the tablespace map
> file
> > > are returned as part of the result of the function and must be written
> to
> > > files in the backup area.
> >
> > I don't understand, "*must be written to files in the backup area*". Does
> > it mean we need to manually create a file first on the master node and
> then
> > rsync it to the backup node?
>
> No, don't do that. You should back up all the files and then store the
> backup_label with those files. Do *not* put a backup_label into the
> data directory on the primary.
>
> * Abhishek Bhola (abhishek(dot)bhola(at)japannext(dot)co(dot)jp) wrote:
> > I got the latest documentation
> > <
> https://www.postgresql.org/docs/current/continuous-archiving.html#BACKUP-LOWLEVEL-BASE-BACKUP
> >and
> > understood that I was supposed to run
> > `select * from pg_backup_stop();` and store the labelfile output to a
> file
> > in the data directory of the secondary node.
>
> The backup_label should be stored with the backed up files and then
> restored as part of restoring the backup, yes.
>
> > I also understood that this Low-level API backup is not really
> > a recommended way and I need to move to another method.
>
> I would recommend that, yes.
>
> > However, till I fully set up a tool like pgBackRest (of which you are one
> > of the contributors - very impressive!) for PROD, can you please tell me:
> > 1) Are there any other options that are safe to use yet fast? Like you
> said
> > rsync --size-only would miss a lot of changes, but anything that would
> also
> > not start syncing every data file with only different timestamp, even
> > though it is exactly the same, including the checksum.
>
> Re-syncing a replica due to it being offline for OS patching or such
> really isn't typically the approach to use- better to have a WAL archive
> and just let the replica catch up by playing the WAL forward. A re-sync
> might make sense if the replica becomes very far out of date such that
> there's a huge backlog of WAL or if you're generating a huge amount of
> WAL that WAL replay just can't keep up, but I'm guessing neither of
> those are really the case here. If they are though, increental backups
> and delta restores with pgBackRest is an approach that has worked for
> high-throughput systems like those.
>
> > 2) While rsyncing from the master node, do I need to exclude `pg_wal`
> > folder?
>
> Along with a bunch of other ones, as discussed in the documentation.
>
> > 3) The replica/standby node should have access to the WAL segments
> archived
> > during time between start and stop backup, right?
>
> To perform any recovery of an online backup, all of the WAL generated
> during the backup (between the pg_backup_start and pg_backup_stop) must
> be preserved and made available via the restore_command for replay.
>
> > 4) What kind of data corruption would have been introduced due to the
> > absence of labelfile and if there is a way to identify the corruption? So
> > for example, if I am able to reindex all the tables, select all the data
> > and vacuum the db, do I still need to pg_dump and pg_restore those DBs?
>
> Any kind. Even taking those steps wouldn't provide a guarantee that
> there's no corruption. Doing a pg_dump/pg_restore would at least help
> you find any data that is invalid according to the constraints which are
> set in the database and that's a good first step but there could still
> be other corruption.
>
> Note that the corruption being discussed here would only be on the
> replica, because that's where the writes that happened between the
> backup start and the backup end got lost due to the lack of a backup
> label. Of course, if you've failed over to the replica, then you'd end
> up with the now-primary having that corruption.
>
> Thanks,
>
> Stephen
>

--
_This correspondence (including any attachments) is for the intended
recipient(s) only. It may contain confidential or privileged information or
both. No confidentiality or privilege is waived or lost by any
mis-transmission. If you receive this correspondence by mistake, please
contact the sender immediately, delete this correspondence (and all
attachments) and destroy any hard copies. You must not use, disclose, copy,
distribute or rely on any part of this correspondence (including any
attachments) if you are not the intended
recipient(s).本メッセージに記載および添付されている情報(以下、総称して「本情報」といいます。)は、本来の受信者による使用のみを意図しています。誤送信等により本情報を取得された場合でも、本情報に係る秘密、または法律上の秘匿特権が失われるものではありません。本電子メールを受取られた方が、本来の受信者ではない場合には、本情報及びそのコピーすべてを削除・破棄し、本電子メールが誤って届いた旨を発信者宛てにご通知下さいますようお願いします。本情報の閲覧、発信または本情報に基づくいかなる行為も明確に禁止されていることをご了承ください。_

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Stephen Frost 2023-07-27 02:39:03 Re: Difference in the tablespace folders on primary and secondary nodes
Previous Message Michael Paquier 2023-07-27 00:55:17 Re: Query on Primary_conninfo