From: | Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it> |
---|---|
To: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | [RFC] Incremental backup v3: incremental PoC |
Date: | 2014-10-14 17:17:27 |
Message-ID: | 543D5AA7.9@2ndquadrant.it |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Hackers,
following the advices gathered on the list I've prepared a third partial
patch on the way of implementing incremental pg_basebackup as described
here https://wiki.postgresql.org/wiki/Incremental_backup
== Changes
Compared to the previous version I've made the following changes:
* The backup_profile is not optional anymore. Generating it is cheap
enough not to bother the user with such a choice.
* I've isolated the code which detects the maxLSN of a segment in a
separate getMaxLSN function. At the moment it works scanning the whole
file, but I'm looking to replace it in the next versions.
* I've made possible to request an incremental backup passing a "-I
<LSN>" option to pg_basebackup. It is probably too "raw" to remain as
is, but it's is useful at this stage to test the code.
* I've modified the backup label to report the fact that the backup was
taken with the incremental option. The result will be something like:
START WAL LOCATION: 0/52000028 (file 000000010000000000000052)
CHECKPOINT LOCATION: 0/52000060
INCREMENTAL FROM LOCATION: 0/51000028
BACKUP METHOD: streamed
BACKUP FROM: master
START TIME: 2014-10-14 16:05:04 CEST
LABEL: pg_basebackup base backup
== Testing it
At this stage you can make an incremental file-level backup using this
procedure:
pg_basebackup -v -F p -D /tmp/x -x
LSN=$(awk '/^START WAL/{print $4}' /tmp/x/backup_profile)
pg_basebackup -v -F p -D /tmp/y -I $LSN -x
the result will be an incremental backup in /tmp/y based on the full
backup on /tmp/x.
You can "reintegrate" the incremental backup in the /tmp/z directory
with the following little python script, calling it as
./recover.py /tmp/x /tmp/y /tmp/z
----
#!/usr/bin/env python
# recover.py
import os
import shutil
import sys
if len(sys.argv) != 4:
print >> sys.stderr, "usage: %s base incremental destination"
sys.exit(1)
base=sys.argv[1]
incr=sys.argv[2]
dest=sys.argv[3]
if os.path.exists(dest):
print >> sys.stderr, "error: destination must not exist (%s)" % dest
sys.exit(1)
profile=open(os.path.join(incr, 'backup_profile'), 'r')
for line in profile:
if line.strip() == 'FILE LIST':
break
shutil.copytree(incr, dest)
for line in profile:
tblspc, lsn, sent, date, size, path = line.strip().split('\t')
if sent == 't' or lsn=='\\N':
continue
base_file = os.path.join(base, path)
dest_file = os.path.join(dest, path)
shutil.copy2(base_file, dest_file)
----
It has obviously to be replaced by a full-fledged user tool, but it is
enough to test the concept.
== What next
I would to replace the getMaxLSN function with a more-or-less persistent
structure which contains the maxLSN for each data segment.
To make it work I would hook into the ForwardFsyncRequest() function in
src/backend/postmaster/checkpointer.c and update an in memory hash every
time a block is going to be fsynced. The structure could be persisted on
disk at some time (probably on checkpoint).
I think a good key for the hash would be a BufferTag with blocknum
"rounded" to the start of the segment.
I'm here asking for comments and advices on how to implement it in an
acceptable way.
== Disclaimer
The code here is an intermediate step, it does not contain any
documentation beside the code comments and will be subject to deep and
radical changes. However I believe it can be a base to allow PostgreSQL
to have its file-based incremental backup, and a block-based incremental
backup after it.
Regards,
Marco
--
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco(dot)nenciarini(at)2ndQuadrant(dot)it | www.2ndQuadrant.it
Attachment | Content-Type | Size |
---|---|---|
file-based-incremental-backup.patch | text/plain | 31.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Stephen Frost | 2014-10-14 17:21:45 | Re: Buffer Requests Trace |
Previous Message | Lucas Lersch | 2014-10-14 17:10:57 | Re: Buffer Requests Trace |