Re: Incremental backup

From: Mladen Gogala <gogala(dot)mladen(at)gmail(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Incremental backup
Date: 2021-10-29 01:14:53
Message-ID: 3d5f2552-c819-c423-cb37-e190f74ea37f@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


On 10/28/21 18:07, Andreas Joseph Krogh wrote:
> I think everybody agrees that incremental backup /per database/, and
> not cluster-wide, is nice, and it would be nice if PG supported it.
> But, given the way PG is architectured, having cluster-wide WALs,
> that's not an easy task to implement.
> Repeating "other databases have it" doesn't change that.
> --
> Andreas Joseph Krogh

I am not advocating for the database level incremental backups, but all
databases that have it also have cluster wide/instance wide WAL logs.
Cluster wide WAL logs do not make database level incremental backups
hard. Both Oracle and DB2 have database level incremental backups and
both have cluster wide WAL (redo or logs). So does SQL*Server which is a
bit specific because its log is used for both recovery, as WAL, and undo
so its very different from Postgres. Also, SQL Server is, to my
knowledge, the only database which doesn't use MVCC.

The technology that all of those databases employ is a bitmap device
which has one bit per each database block. Full backups set all bits to
0 and whenever block is modified, the corresponding bit is set to 1. The
backup tool in the incremental mode then only copies blocks with the
bitmap value of 1. I am not too thrilled by that implementation. In
particular, with Oracle there were bugs with database restore,
"duplicate database" operation, global cache locks and instance latches.
I've had quite a few headaches with RMAN cumulative incremental backups.
My preferred method of backup is storage snapshot. Snapshots then can
be  backed up to other arrays (NetApp SnapVault, Hitachi HUR, EMC SRDF) 
or can be backed up to deduplicated offline storage like AWS Glacier or
EMC Data Domain using simple file level utilities. Once snapshot is
taken, it is a read-only file system and the files are no longer opened
by the database processes.

The classic file level backup tools like pg_basebackup or rman are
pretty much unusable once your database hits 50 TB or so. With 50TB
database, your RTO (Recovery Time Objective) will be in days. You can
only push around 3TB/hour down a 10Gbit Ethernet pipe. On the other hand
companies like Facebook. Amazon, Walmart or Target are losing hundreds
of thousands USD per hour of downtime. Downtime of 15 hours or longer is
completely unacceptable. Backup is only the last line of defense. It
should only be used if everything else fails. And if you have to use
file level tool like pg_basebackup and tar to restore your VLDB, you
should also get your resume ready.

Incremental backups are awkward and ungainly. You have to restore the
full backup and all incrementals taken since the last full. After that,
you must apply the remaining WAL files. So, by definition, restoring
incremental backup is, by definition, longer than restoring a full
backup. So, if you have to take a backup, get yourself a deduplicating
backup utility. Deduplication will only save the backup blocks that are
different from the previous backup. In other words, you can do a daily
full backup with the space expenditure of an incremental backup. Also,
if you need backups because of regulatory reasons (HIPAA, SOX),
incremental backups will not do.

Long story short, there are much more important things to do on Postgres
than incremental database level backups. BTW, Chuck Norris can take and
incremental database level backup of a Postgres database and recover
database without restoring the preceding full backup. I am not sure
whether Chuck Norris is a member of Postgres steering committee.

--
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Zahid Rahman 2021-10-29 01:26:50 Re: database designs ERDs
Previous Message Ron 2021-10-28 22:24:55 Re: Incremental backup