Re: Postgres backup tool recommendations for multi-terabyte database in Google Cloud

From: Craig Jackson <craig(dot)jackson(at)broadcom(dot)com>
To: Craig James <cjames(at)emolecules(dot)com>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Postgres backup tool recommendations for multi-terabyte database in Google Cloud
Date: 2019-12-05 21:05:05
Message-ID: CA+R1LV7uhXHSYgBM0avnW8y2sYqXzMgnyBHB55ww4GD_d4Yg6w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Thanks, I'll check it out.

On Thu, Dec 5, 2019 at 12:51 PM Craig James <cjames(at)emolecules(dot)com> wrote:

> On Thu, Dec 5, 2019 at 9:48 AM Craig Jackson <craig(dot)jackson(at)broadcom(dot)com>
> wrote:
>
>> Hi,
>>
>> We are in the process of migrating an oracle database to postgres in
>> Google Cloud and are investigating backup/recovery tools. The database is
>> size is > 20TB. We have an SLA that requires us to be able to complete a
>> full restore of the database within 24 hours. We have been testing
>> pgbackreset, barman, and GCP snapshots but wanted to see if there are any
>> other recommendations we should consider.
>>
>> *Desirable features*
>> - Parallel backup/recovery
>> - Incremental backups
>> - Backup directly to a GCP bucket
>> - Deduplication/Compression
>>
>
> For your 24-hour-restore requirement, there's an additional feature you
> might consider: incremental restore, or what you might call "recovery in
> place"; that is, the ability to keep a more-or-less up-to-date copy, and
> then in an emergency only restore the diffs on the file system. pgbackup
> uses a built-in rsync-like feature, plus a client-server architecture, that
> allows it to quickly determine which disk blocks need to be updated.
> Checksums are computed on each side, and data are only transferred if
> checksums differ. It's very efficient. I assume that a 20 TB database is
> mostly static, with only a small fraction of the data updated in any month.
> I believe the checksums are precomputed and stored in the pgbackrest
> repository, so you can even do this from an Amazon S3 (or whatever Google's
> Cloud equivalent is for low-cost storage) backup with just modest bandwidth
> usage.
>
> In a cloud environment, you can do this on modestly-priced hardware (a few
> CPUs, modest memory). In the event of a failover, unmount your backup disk,
> spin up a big server, mount the database, do the incremental restore, and
> you're in business.
>
> Craig (James)
>
>
>> Any suggestions would be appreciated.
>>
>> Craig Jackson
>>
>
>
>

--
Craig

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Nikolay Samokhvalov 2019-12-05 21:08:41 Re: Postgres backup tool recommendations for multi-terabyte database in Google Cloud
Previous Message Craig James 2019-12-05 19:51:03 Re: Postgres backup tool recommendations for multi-terabyte database in Google Cloud