Re: [PATCH] Incremental backup: add backup profile to base backup

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Incremental backup: add backup profile to base backup
Date: 2014-08-20 12:36:47
Message-ID: 53F4965F.4030905@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I think this has had enough review for a WIP patch. I'm marking this as
Returned with Feedback in the commitfest because:

* should use LSNs instead of a md5
* this doesn't do anything useful on its own, hence would need to see
the whole solution before committing
* not clear how this would be extended if you wanted to do more
fine-grained than file-level diffs.

But please feel free to continue discussing those items.

On 08/18/2014 03:04 AM, Marco Nenciarini wrote:
> Hi Hackers,
>
> This is the first piece of file level incremental backup support, as
> described on wiki page https://wiki.postgresql.org/wiki/Incremental_backup
>
> It is not yet complete, but I wish to share it on the list to receive
> comments and suggestions.
>
> The point of the patch is adding an option to pg_basebackup and
> replication protocol BASE_BACKUP command to generate a backup_profile file.
>
> When taking a full backup with pg_basebackup, the user can request
> Postgres to generate a backup_profile file through the --profile option
> (-B short option, which I've arbitrarily picked up because both -P and
> -p was already taken)
>
> At the moment the backup profile consists of a file with one line per
> file detailing modification time, md5, size, tablespace and path
> relative to tablespace root (PGDATA or the tablespace)
>
> To calculate the md5 checksum I've used the md5 code present in pgcrypto
> contrib as the code in src/include/libpq/md5.h is not suitable for large
> files. Since a core feature cannot depend on a piece of contrib, I've
> moved the files
>
> contrib/pgcrypto/md5.c
> contrib/pgcrypto/md5.h
>
> to
>
> src/backend/utils/hash/md5.c
> src/include/utils/md5.h
>
> changing the pgcrypto extension to use them.
>
> There are still some TODOs:
>
> * User documentation
>
> * Remove the pg_basebackup code duplication I've introduced with the
> ReceiveAndUnpackTarFileToDir function, which is almost the same of
> ReceiveAndUnpackTarFile but does not expect to handle a tablespace. It
> instead simply extract a tar stream in a destination directory. The
> latter could probably be rewritten using the former as component, but it
> needs some adjustment to the "progress reporting" part, which is not
> present at the moment in ReceiveAndUnpackTarFileToDir.
>
> * Add header section to backup_profile file which at the moment contains
> only the body part. I'm thinking to change the original design and put
> the whole backup label as header, which is IMHO clearer and well known.
> I would use something like:
>
> START WAL LOCATION: 0/E000028 (file 00000001000000000000000E)
> CHECKPOINT LOCATION: 0/E000060
> BACKUP METHOD: streamed
> BACKUP FROM: master
> START TIME: 2014-08-14 18:54:01 CEST
> LABEL: pg_basebackup base backup
> END LABEL
>
> I've attached the current patch based on master.
>
> Any comment will be appreciated.
>
> Regards,
> Marco
>

--
- Heikki

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Stark 2014-08-20 12:58:58 Re: [patch] pg_copy - a command for reliable WAL archiving
Previous Message Greg Stark 2014-08-20 12:20:31 Re: Hokey wrong versions of libpq in apt.postgresql.org