Re: backup manifests

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Suraj Kharage <suraj(dot)kharage(at)enterprisedb(dot)com>
Cc: Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, Tels <nospam-pg-abuse(at)bloodgate(dot)com>, David Steele <david(at)pgmasters(dot)net>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Subject: Re: backup manifests
Date: 2019-12-30 18:22:59
Message-ID: CA+TgmobnSHW3upkS_x6D=cM9A3jT9ZA0pcH0FJitBfkyW7sFJA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 24, 2019 at 5:42 AM Suraj Kharage
<suraj(dot)kharage(at)enterprisedb(dot)com> wrote:
> Made the change in backup manifest as well in backup validatort patch. Thanks to Rushabh Lathia for the offline discussion and help.
>
> To examine the first word of each line, I am using below check:
> if (strncmp(line, "File", 4) == 0)
> {
> ..
> }
> else if (strncmp(line, "Manifest-Checksum", 17) == 0)
> {
> ..
> }
> else
> error
>
> strncmp might be not right here, but we can not put '\0' in between the line (to find out first word)
> before we recognize the line type.
> All the lines expect line last one (where we have manifest checksum) are feed to the checksum machinary to calculate manifest checksum.
> so update_checksum() should be called after recognizing the type, i.e: if it is a File type record. Do you see any issues with this?

I see the problem, but I don't think your solution is right, because
the first test would pass if the line said FiletMignon rather than
just File, which we certainly don't want. You've got to write the test
so that you're checking against the whole first word, not just some
prefix of it. There are several possible ways to accomplish that, but
this isn't one of them.

>> + pg_log_error("invalid record found in \"%s\"", manifest_path);
>>
>> Error message needs work.

Looks better now, but you have a messages that say "invalid checksums
type \"%s\" found in \"%s\"". This is wrong because checksums would
need to be singular in this context (checksum). Also, I think it could
be better phrased as "manifest file \"%s\" specifies unknown checksum
algorithm \"%s\" at line %d".

>> Your function names should be consistent with the surrounding style,
>> and with each other, as far as possible. Three different conventions
>> within the same patch and source file seems over the top.

This appears to be fixed.

>> Also keep in mind that you're not writing code in a vacuum. There's a
>> whole file of code here, and around that, a whole project.
>> scan_data_directory() is a good example of a function whose name is
>> clearly too generic. It's not a general-purpose function for scanning
>> the data directory; it's specifically a support function for verifying
>> a backup. Yet, the name gives no hint of this.

But this appears not to be fixed.

>> if (strcmp(newpathsuffix, "/pg_wal") == 0 || strcmp(newpathsuffix,
>> "/backup_manifest") == 0)
>> continue;
>
> Thanks for the suggestion. Corrected as per the above inputs.

You need a comment here, like "Ignore the possible presence of a
backup_manifest file and/or a pg_wal directory in the backup being
verified." and then maybe another sentence explaining why that's the
right thing to do.

+ * The forth parameter to VerifyFile() will pass the relative path
+ * of file to match exactly with the filename present in manifest.

I don't know what this comment is trying to tell me, which might be
something you want to try to fix. However, I'm pretty sure it's
supposed to say "fourth" not "forth".

>> and the result would be that everything inside that long if-block is
>> now at the top level of the function and indented one level less. And
>> I think if you look at this function you'll see a way that you can
>> save a *second* level of indentation for much of that code. Please
>> check the rest of the patch for similar cases, too.
>
> Make sense. corrected.

I don't agree. A large chunk of VerifyFile() is still subject to a
quite unnecessary level of indentation.

> I have added a check for EOF, but not sure whether that woule be right here.
> Do we need to check the length of buffer as well?

That's really, really not right. EOF is not a character that can
appear in the buffer. It's chosen on purpose to be a value that never
matches any actual character when both the character and the EOF value
are regarded as values of type 'int'. That guarantee doesn't apply
here though because you're dealing with values of type 'char'. So what
this code is doing is searching for an impossible value using
incorrect logic, which has very little to do with the actual need
here, which is to avoid running off the end of the buffer. To see what
the problem is, try creating a file with no terminating newline, like
this:

echo -n this file has no terminating newline >> some-file

I doubt it will be very hard to make this patch crash horribly. Even
if you can't, it seems pretty clear that the logic isn't right.

I don't really know what the \0 tests in NextLine() and NextWord()
think they're doing either. If there's a \0 in the buffer before you
add one, it was in the original input data, and pretending like that
marks a word or line boundary seems like a fairly arbitrary choice.

What I suggest is:

(1) Allocate one byte more than the file size for the buffer that's
going to hold the file, so that if you write a \0 just after the last
byte of the file, you don't overrun the allocated buffer.

(2) Compute char *endptr = buf + len.

(3) Pass endptr to NextLine and NextWord and write the loop condition
something like while (*buf != '\n' && buf < endptr).

Other notes:

- The error handling in ReadFileIntoBuffer() does not seem to consider
the case of a short read. If you look through the source tree, you can
find examples of how we normally handle that.

- Putting string_hash_sdbm() into encode.c seems like a surprising
choice. What does this have to do with encoding anything? And why is
it going into src/common at all if it's only intended for frontend
use?

- It seems like whether or not any problems were found while verifying
the manifest ought to affect the exit status of pg_basebackup. I'm not
exactly sure what exit codes ought to be used, but you could look for
similar precedents. Document this, too.

- As much as possible let's have errors in the manifest file report
the line number, and let's also try to make them more specific, e.g.
instead of "invalid manifest record found in \"%s\"", perhaps
"manifest file \"%s\" contains invalid keyword \"%s\" at line %d".

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2019-12-30 18:52:50 Re: [HACKERS] pg_shmem_allocations view
Previous Message Robert Haas 2019-12-30 17:44:53 Re: Building infrastructure for B-Tree deduplication that recognizes when opclass equality is also equivalence