Automatic tablespace management in pg_basebackup

From: Thom Brown <thom(at)linux(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Automatic tablespace management in pg_basebackup
Date: 2024-04-27 03:07:14
Message-ID: CAA-aLv7BRCNaqFeWp7Wx6ro74+WpWURyuVEYP0j=HNue1a3DTQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Manually specifying tablespace mappings in pg_basebackup, especially in
environments where tablespaces can come and go, or with incremental
backups, can be tedious and error-prone. I propose a solution using
pattern-based mapping to automate this process.

So rather than having to specify.

-T /path/to/original/tablespace/a=/path/to/backup/tablespace/a -T
/path/to/original/tablespace/b=/path/to/backup/tablespace/b

And then coming up with a new location to map to for the subsequent
incremental backups, perhaps we could have a parameter (I’m just going to
choose M for “mapping”), like so:

-M %p/%d_backup_1.1

Where it can interpolate the following values:
%p = path
%d = directory
%l = label (not sure about this one)

Using the -M example above, when pg_basebackup finds:

/path/to/original/tablespace/a
/path/to/original/tablespace/b

It creates:

/path/to/original/tablespace/a_backup_1.1
/path/to/original/tablespace/b_backup_1.1

Or:

-M /path/to/backup/tablespaces/1.1/%d

Creates:

/path/to/backup/tablespaces/1.1/a
/path/to/backup/tablespaces/1.1/b

Or possibly allowing something like %l to insert the backup label.

For example:

-M /path/to/backup/tablespaces/%f_%l -l 1.1

Creates:

/path/to/backup/tablespaces/a_1.1
/path/to/backup/tablespaces/b_1.1

This of course would not work if there were tablespaces as follows:

/path/to/first/tablespace/a
/path/to/second/tablespace/a

Where %d would yield the same result for both tablespaces. However, this
seems like an unlikely scenario as the tablespace name within the database
would need to be unique, but then requires them to use a directory name
that isn't unique. This could just be a scenario that isn't supported.

Perhaps even allow it to auto-increment a version number it defines
itself. Maybe %v implies “make up a version number here, and if one
existed in the manifest previously, increment it”.

Ultimately, it would turn this:

pg_basebackup
-D /Users/thombrown/Development/backups/data1.5
-h /tmp
-p 5999
-c fast
-U thombrown
-l 1.5
-T
/Users/thombrown/Development/tablespaces/ts_a=/Users/thombrown/Development/backups/tablespaces/1.5/backup_ts_a
-T
/Users/thombrown/Development/tablespaces/ts_b=/Users/thombrown/Development/backups/tablespaces/1.5/backup_ts_b
-T
/Users/thombrown/Development/tablespaces/ts_c=/Users/thombrown/Development/backups/tablespaces/1.5/backup_ts_c
-T
/Users/thombrown/Development/tablespaces/ts_d=/Users/thombrown/Development/backups/tablespaces/1.5/backup_ts_d
-i /Users/thombrown/Development/backups/data1.4/backup_manifest

Into this:

pg_basebackup
-D /Users/thombrown/Development/backups/1.5/data
-h /tmp
-p 5999
-c fast
-U thombrown
-l 1.5
-M /Users/thombrown/Development/backups/tablespaces/%v/%d
-i /Users/thombrown/Development/backups/data1.4/backup_manifest

In fact, if I were permitted to get carried away:

-D /Users/thombrown/Development/backups/%v/%d

Then, the only thing that needs changing for each incremental backup is the
manifest location (and optionally the label).

Given that pg_combinebackup has the same option, I imagine something
similar would need to be added there too. We should already know where the
tablespaces reside, as they are in the final backup specified in the list
of backups, so that seems to just be a matter of getting input of how the
tablespaces should be named in the reconstructed backup.

For example:

pg_combinebackup
-T
/Users/thombrown/Development/backups/tablespaces/1.4/ts_a=/Users/thombrown/Development/backups/tablespaces/2.0_combined/ts_a
-T
/Users/thombrown/Development/backups/tablespaces/1.4/ts_b=/Users/thombrown/Development/backups/tablespaces/2.0_combined/ts_b
-T
/Users/thombrown/Development/backups/tablespaces/1.4/ts_c=/Users/thombrown/Development/backups/tablespaces/2.0_combined/ts_c
-T
/Users/thombrown/Development/backups/tablespaces/1.4/ts_d=/Users/thombrown/Development/backups/tablespaces/2.0_combined/ts_d
-o /Users/thombrown/Development/backups/combined
/Users/thombrown/Development/backups/data{1.0_full,1.1,1.2,1.3,1.4}

Becomes:
pg_combinebackup
-M /Users/thombrown/Development/backups/tablespaces/%v_combined/%d
-o /Users/thombrown/Development/backups/%v_combined/%d
/Users/thombrown/Development/backups/{1.0_full,1.1,1.2,1.3,1.4}/data

You may have inferred that I decided pg_combinebackup increments the
version to the next major version, whereas pg_basebackup in incremental
mode increments the minor version number.

This, of course, becomes messy if the user decided to include the version
number in the backup tablespace directory name, but then these sorts of
things need to be figured out prior to placing into production anyway.

I also get the feeling that accepting an unquoted % as a parameter on the
command line could be problematic, such as it having a special meaning I
haven't accounted for here. In which case, it may require quoting.

Thoughts?

Regards

Thom

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2024-04-27 03:36:12 Re: New committers: Melanie Plageman, Richard Guo
Previous Message DEVOPS_WwIT 2024-04-27 01:28:46 Re: New committers: Melanie Plageman, Richard Guo