Re: Postgres backup tool recommendations for multi-terabyte database in Google Cloud

From: Craig James <cjames(at)emolecules(dot)com>
To: Craig Jackson <craig(dot)jackson(at)broadcom(dot)com>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Postgres backup tool recommendations for multi-terabyte database in Google Cloud
Date: 2019-12-05 19:51:03
Message-ID: CAFwQ8rcrhPquE6cq2NjUpnHy_Wd8KK_pcXEzAE+ia2-C1sRL8Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, Dec 5, 2019 at 9:48 AM Craig Jackson <craig(dot)jackson(at)broadcom(dot)com>
wrote:

> Hi,
>
> We are in the process of migrating an oracle database to postgres in
> Google Cloud and are investigating backup/recovery tools. The database is
> size is > 20TB. We have an SLA that requires us to be able to complete a
> full restore of the database within 24 hours. We have been testing
> pgbackreset, barman, and GCP snapshots but wanted to see if there are any
> other recommendations we should consider.
>
> *Desirable features*
> - Parallel backup/recovery
> - Incremental backups
> - Backup directly to a GCP bucket
> - Deduplication/Compression
>

For your 24-hour-restore requirement, there's an additional feature you
might consider: incremental restore, or what you might call "recovery in
place"; that is, the ability to keep a more-or-less up-to-date copy, and
then in an emergency only restore the diffs on the file system. pgbackup
uses a built-in rsync-like feature, plus a client-server architecture, that
allows it to quickly determine which disk blocks need to be updated.
Checksums are computed on each side, and data are only transferred if
checksums differ. It's very efficient. I assume that a 20 TB database is
mostly static, with only a small fraction of the data updated in any month.
I believe the checksums are precomputed and stored in the pgbackrest
repository, so you can even do this from an Amazon S3 (or whatever Google's
Cloud equivalent is for low-cost storage) backup with just modest bandwidth
usage.

In a cloud environment, you can do this on modestly-priced hardware (a few
CPUs, modest memory). In the event of a failover, unmount your backup disk,
spin up a big server, mount the database, do the incremental restore, and
you're in business.

Craig (James)

> Any suggestions would be appreciated.
>
> Craig Jackson
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Craig Jackson 2019-12-05 21:05:05 Re: Postgres backup tool recommendations for multi-terabyte database in Google Cloud
Previous Message Craig Jackson 2019-12-05 17:47:46 Postgres backup tool recommendations for multi-terabyte database in Google Cloud