From: | Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: block-level incremental backup |
Date: | 2019-07-10 18:16:59 |
Message-ID: | bc1b3253-8deb-a8f4-7bf3-4e5cef3d3fd6@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
23.04.2019 14:08, Anastasia Lubennikova wrote:
> I'm volunteering to write a draft patch or, more likely, set of
> patches, which
> will allow us to discuss the subject in more detail.
> And to do that I wish we agree on the API and data format (at least
> broadly).
> Looking forward to hearing your thoughts.
Though the previous discussion stalled,
I still hope that we could agree on basic points such as a map file
format and protocol extension,
which is necessary to start implementing the feature.
--------- Proof Of Concept patch ---------
In attachments, you can find a prototype of incremental pg_basebackup,
which consists of 2 features:
1) To perform incremental backup one should call pg_basebackup with a
new argument:
pg_basebackup -D 'basedir' --prev-backup-start-lsn 'lsn'
where lsn is a start_lsn of parent backup (can be found in
"backup_label" file)
It calls BASE_BACKUP replication command with a new argument
PREV_BACKUP_START_LSN 'lsn'.
For datafiles, only pages with LSN > prev_backup_start_lsn will be
included in the backup.
They are saved into 'filename.partial' file, 'filename.blockmap' file
contains an array of BlockNumbers.
For example, if we backuped blocks 1,3,5, filename.partial will contain
3 blocks, and 'filename.blockmap' will contain array {1,3,5}.
Non-datafiles use the same format as before.
2) To merge incremental backup into a full backup call
pg_basebackup -D 'basedir' --incremental-pgdata 'incremental_basedir'
--merge-backups
It will move all files from 'incremental_basedir' to 'basedir' handling
'.partial' files correctly.
--------- Questions to discuss ---------
Please note that it is just a proof-of-concept patch and it can be
optimized in many ways.
Let's concentrate on issues that affect the protocol or data format.
1) Whether we collect block maps using simple "read everything page by
page" approach
or WAL scanning or any other page tracking algorithm, we must choose a
map format.
I implemented the simplest one, while there are more ideas:
- We can have a map not per file, but per relation or maybe per tablespace,
which will make implementation more complex, but probably more optimal.
The only problem I see with existing implementation is that even if only
a few blocks changed,
we still must pad it to 512 bytes per tar format requirements.
- We can save LSNs into the block map.
typedef struct BlockMapItem {
BlockNumber blkno;
XLogRecPtr lsn;
} BlockMapItem;
In my implementation, invalid prev_backup_start_lsn means fallback to
regular basebackup
without any block maps. Alternatively, we can define another meaning of
this value and send a block map for all files.
Backup utilities can use these maps to speed up backup merge or restore.
2) We can implement BASE_BACKUP SEND_FILELIST replication command,
which will return a list of filenames with file sizes and block maps if
lsn was provided.
To avoid changing format, we can simply send tar headers for each file:
- tarHeader("filename.blockmap") followed by blockmap for relation files
if prev_backup_start_lsn is provided;
- tarHeader("filename") without actual file content for non relation
files or for all files in "FULL" backup
The caller can parse messages and use them for any purpose, for example,
to perform a parallel backup.
Thoughts?
--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachment | Content-Type | Size |
---|---|---|
incremental_basebackup_v0.patch | text/x-patch | 24.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2019-07-10 18:24:59 | Re: doc: minor update for description of "pg_roles" view |
Previous Message | Bruce Momjian | 2019-07-10 17:39:55 | Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS) |