Proposal: Incremental Backup

From: Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Proposal: Incremental Backup
Date: 2014-07-25 13:14:22
Message-ID: 53D2582E.3080105@2ndquadrant.it
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

0. Introduction:
=================================
This is a proposal for adding incremental backup support to streaming
protocol and hence to pg_basebackup command.

1. Proposal
=================================
Our proposal is to introduce the concept of a backup profile. The backup
profile consists of a file with one line per file detailing tablespace,
path, modification time, size and checksum.
Using that file the BASE_BACKUP command can decide which file needs to
be sent again and which is not changed. The algorithm should be very
similar to rsync, but since our files are never bigger than 1 GB per
file that is probably granular enough not to worry about copying parts
of files, just whole files.

This way of operating has also some advantages over using rsync to take
a physical backup: It does not require the files from the previous
backup to be checksummed again, and they could even reside on some form
of long-term, not-directly-accessible storage, like a tape cartridge or
somewhere in the cloud (e.g. Amazon S3 or Amazon Glacier).

It could also be used in 'refresh' mode, by allowing the pg_basebackup
command to 'refresh' an old backup directory with a new backup.

The final piece of this architecture is a new program called
pg_restorebackup which is able to operate on a "chain of incremental
backups", allowing the user to build an usable PGDATA from them or
executing maintenance operations like verify the checksums or estimate
the final size of recovered PGDATA.

We created a wiki page with all implementation details at
https://wiki.postgresql.org/wiki/Incremental_backup

2. Goals
=================================
The main goal of incremental backup is to reduce the size of the backup.
A secondary goal is to reduce backup time also.

3. Development plan
=================================
Our development plan proposal is articulated in four phases:

Phase 1: Add ‘PROFILE’ option to ‘BASE_BACKUP’
Phase 2: Add ‘INCREMENTAL’ option to ‘BASE_BACKUP’
Phase 3: Support of PROFILE and INCREMENTAL for pg_basebackup
Phase 4: pg_restorebackup

We are willing to get consensus over our design here before to start
implementing it.

Regards,
Marco

--
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco(dot)nenciarini(at)2ndQuadrant(dot)it | www.2ndQuadrant.it

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-07-25 13:50:31 Re: Shapes on the regression test for polygon
Previous Message Fujii Masao 2014-07-25 12:41:35 Re: postgresql.auto.conf and reload