Re: Add system identifier to backup manifest

From: Amul Sul <sulamul(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add system identifier to backup manifest
Date: 2024-01-25 07:52:00
Message-ID: CAAJ_b9417seAdhwhMVr=PnQ+MPx7uAeS5FAw2qMCXF5MrE-7wQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 24, 2024 at 10:53 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Mon, Jan 22, 2024 at 2:22 AM Amul Sul <sulamul(at)gmail(dot)com> wrote:
> > Thinking a bit more on this, I realized parse_manifest_file() has many
> out
> > parameters. Instead parse_manifest_file() should simply return manifest
> data
> > like load_backup_manifest(). Attached 0001 patch doing the same, and
> removed
> > parser_context structure, and added manifest_data, and did the required
> > adjustments to pg_verifybackup code.
>
> InitializeBackupManifest(&manifest, opt->manifest,
> -
> opt->manifest_checksum_type);
> +
> opt->manifest_checksum_type,
> +
> GetSystemIdentifier());
>
> InitializeBackupManifest() can just call GetSystemIdentifier() itself,
> instead of passing another parameter, I think.
>

Ok.

>
> + if (manifest_version == 1)
> + context->error_cb(context,
> + "%s: backup manifest
> doesn't support incremental changes",
> +
> private_context->backup_directory);
>
> I think this is weird. First, there doesn't seem to be any reason to
> bounce through error_cb() here. You could just call pg_fatal(), as we
> do elsewhere in this file. Second, there doesn't seem to be any need
> to include the backup directory in this error message. We include the
> file name (not the directory name) in errors that pertain to the file
> itself, like if we can't open or read it. But we don't do that for
> semantic errors about the manifest contents (cf.
> combinebackup_per_file_cb). This file would need a lot fewer charges
> if you didn't feed the backup directory name through here. Third, the
> error message is not well-chosen, because a user who looks at it won't
> understand WHY the manifest doesn't support incremental changes. I
> suggest "backup manifest version 1 does not support incremental
> backup".
>
> + /* Incremental backups supported on manifest version 2 or later */
> + if (manifest_version == 1)
> + context->error_cb(context,
> + "incremental backups
> cannot be taken for this backup");
>
> Let's use the same error message as in the previous case here also.
>

Ok.

> + for (i = 0; i < n_backups; i++)
> + {
> + if (manifests[i]->system_identifier != system_identifier)
> + {
> + char *controlpath;
> +
> + controlpath = psprintf("%s/%s",
> prior_backup_dirs[i], "global/pg_control");
> +
> + pg_fatal("manifest is from different database
> system: manifest database system identifier is %llu, %s system
> identifier is %llu",
> + (unsigned long long)
> manifests[i]->system_identifier,
> + controlpath,
> + (unsigned long long)
> system_identifier);
> + }
> + }
>
> check_control_files() already verifies that all of the control files
> contain the same system identifier as each other, so what we're really
> checking here is that the backup manifest in each directory has the
> same system identifier as the control file in that same directory. One
> problem is that backup manifests are optional here, as per the comment
> in load_backup_manifests(), so you need to skip over NULL entries
> cleanly to avoid seg faulting if one is missing. I also think the
> error message should be changed. How about "%s: manifest system
> identifier is %llu, but control file has %llu"?
>

Ok.

> + context->error_cb(context,
> + "manifest is from
> different database system: manifest database system identifier is
> %llu, pg_control database system identifier is %llu",
> + (unsigned long long)
> manifest_system_identifier,
> + (unsigned long long)
> system_identifier);
>
> And here, while I'm kibitzing, how about "manifest system identifier
> is %llu, but this system's identifier is %llu"?
>

I used "database system identifier" instead of "this system's identifier "
like
we are using in WalReceiverMain() and libpqrcv_identify_system().

> - qr/could not open directory/,
> + qr/could not open file/,
>
> I don't think that the expected error message here should be changing.
> Does it really, with the latest patch version? Why? Can we fix that?
>

Because, we were trying to access pg_control to check the system identifier
before any other backup directory/file validation.

> + else if (!parse->saw_system_identifier_field &&
> +
> strcmp(parse->manifest_version, "1") != 0)
>
> I don't think this has any business testing the manifest version.
> That's something to sort out at some later stage.
>

That is for backward compatibility, otherwise, we would have an "expected
system identifier" error for manifest version 1.

Thank you for the review-comments, updated version attached.

Regards,
Amul

Attachment Content-Type Size
v5-0001-pg_verifybackup-code-refactor.patch application/x-patch 8.0 KB
v5-0002-Add-system-identifier-to-backup-manifest.patch application/x-patch 30.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bertrand Drouvot 2024-01-25 07:55:32 Re: Synchronizing slots from primary to standby
Previous Message Richard Guo 2024-01-25 07:47:04 Re: A compiling warning in jsonb_populate_record_valid