Re: Add system identifier to backup manifest

From: Amul Sul <sulamul(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add system identifier to backup manifest
Date: 2024-01-19 17:06:26
Message-ID: CAAJ_b94Ch7HW=+C=fLz1oqDDv-2dqctmcTOkgCHGq4LwSEQd8g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 17, 2024 at 8:40 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Wed, Jan 17, 2024 at 6:31 AM Amul Sul <sulamul(at)gmail(dot)com> wrote:
> > With the attached patch, the backup manifest will have a new key item as
> > "System-Identifier" 64-bit integer whose value is derived from
> pg_control while
> > generating it, and the manifest version bumps to 2.
> >
> > This helps to identify the correct database server and/or backup for the
> > subsequent backup operations. pg_verifybackup validates the manifest
> system
> > identifier against the backup control file and fails if they don’t match.
> > Similarly, pg_basebackup increment backup will fail if the manifest
> system
> > identifier does not match with the server system identifier. The
> > pg_combinebackup is already a bit smarter -- checks the system
> identifier from
> > the pg_control of all the backups, with this patch the manifest system
> > identifier also validated.
>
> Thanks for working on this. Without this, I think what happens is that
> you can potentially take an incremental backup from the "wrong"
> server, if the states of the systems are such that all of the other
> sanity checks pass. When you run pg_combinebackup, it'll discover the
> problem and tell you, but you ideally want to discover such errors at
> backup time rather than at restore time. This addresses that. And,
> overall, I think it's a pretty good patch. But I nonetheless have a
> bunch of comments.
>

Thank you for the review.

>
> - The associated value is always the integer 1.
> + The associated value is the integer, either 1 or 2.
>
> is an integer. Beginning in <productname>PostgreSQL</productname> 17,
> it is 2; in older versions, it is 1.
>

Ok,

> + context.identity_cb = manifest_process_identity;
>
> I'm not really on board with calling the system identifier "identity"
> throughout the patch. I think it should just say system_identifier. If
> we were going to abbreviate, I'd prefer something like "sysident" that
> looks like it's derived from "system identifier" rather than
> "identity" which is a different word altogether. But I don't think we
> should abbreviate unless doing so creates *ridiculously* long
> identifier names.
>

Ok, used "system identifier" at all the places.

> +static void
> +manifest_process_identity(JsonManifestParseContext *context,
> + int manifest_version,
> + uint64 manifest_system_identifier)
> +{
> + uint64 system_identifier;
> +
> + /* Manifest system identifier available in version 2 or later */
> + if (manifest_version == 1)
> + return;
>
> I think you've got the wrong idea here. I think this function would
> only get called if System-Identifier is present in the manifest, so if
> it's a v1 manifest, this would never get called, so this if-statement
> would not ever do anything useful. I think what you should do is (1)
> if the client supplies a v1 manifest, reject it, because surely that's
> from an older server version that doesn't support incremental backup;
> but do that when the version is parsed rather than here; and (2) also
> detect and reject the case when it's supposedly a v2 manifest but this
> is absent.
>
> (1) should really be done when the version number is parsed, so I
> suspect you may need to add manifest_version_cb.
>
> +static void
> +combinebackup_identity_cb(JsonManifestParseContext *context,
> + int manifest_version,
> + uint64 manifest_system_identifier)
> +{
> + parser_context *private_context = context->private_data;
> + uint64 system_identifier = private_context->system_identifier;
> +
> + /* Manifest system identifier available in version 2 or later */
> + if (manifest_version == 1)
> + return;
>
> Very similar to the above case. Just reject a version 1 manifest as
> soon as we see the version number. In this function, set a flag
> indicating we saw the system identifier; if at the end of parsing that
> flag is not set, kaboom.
>

Ok, I added a version_cb callback. Using this pg_combinebackup &
pg_basebackup
will report an error for manifest version 1, whereas pg_verifybackup
doesn't (not needed IIUC).

>
> - parse_manifest_file(manifest_path, &context.ht, &first_wal_range);
> + parse_manifest_file(manifest_path, &context.ht, &first_wal_range,
> + context.backup_directory);
>
> Don't do this! parse_manifest_file() should just record everything
> found in the manifest in the context object. Syntax validation should
> happen while parsing the manifest (e.g. "CAT/DOG" is not a valid LSN
> and we should reject that at this stage) but semantic validation
> should happen later (e.g. "0/0" can't be a the correct backup end LSN
> but we don't figure that out while parsing, but rather later). I think
> you should actually move validation of the system identifier to the
> point where the directory walk encounters the control file (and update
> the docs and tests to match that decision). Imagine if you wanted to
> validate a tar-format backup; then you wouldn't have random access to
> the directory. You'd see the manifest file first, and then all the
> files in a random order, with one chance to look at each one.
>
>
Agree. I have moved the system identifier validation after manifest
parsing.
But, not in the directory walkthrough since in pg_combinebackup, we don't
really needed to open the pg_control file to get the system identifier,
which we
have from the check_control_files().

> (This is, in fact, a feature I think we should implement.)
>
> - if (strcmp(token, "1") != 0)
> + parse->manifest_version = atoi(token);
> + if (parse->manifest_version != 1 && parse->manifest_version != 2)
> json_manifest_parse_failure(parse->context,
> "unexpected manifest version");
>
> Please either (a) don't do a string-to-integer conversion and just
> strcmp() twice or (b) use strtol so that you can check that it
> succeeded. I don't want to accept manifest version 1a as 1.
>

Understood, corrected in the attached version.

> +/*
> + * Validate manifest system identifier against the database server system
> + * identifier.
> + */
>
> This comment assumes you know what the callback is going to do, but
> you don't. This should be more like the comment for
> json_manifest_finalize_file or json_manifest_finalize_wal_range.
>

Ok.

Updated version is attached.

Regards,
Amul

Attachment Content-Type Size
v2-0001-Add-system-identifier-to-backup-manifest.patch application/x-patch 34.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Matthias Kuhn 2024-01-19 17:12:24 Re: Build versionless .so for Android
Previous Message James Coleman 2024-01-19 17:00:42 PG12 change to DO UPDATE SET column references