Re: backup manifests

From: Suraj Kharage <suraj(dot)kharage(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, Tels <nospam-pg-abuse(at)bloodgate(dot)com>, David Steele <david(at)pgmasters(dot)net>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Subject: Re: backup manifests
Date: 2019-12-12 12:32:49
Message-ID: CAF1DzPVeJdV3q4oiCBaTs5jMoUYkqNRicAYR-DAHuHGA7ccO=w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks, Robert for the review.

On Wed, Dec 11, 2019 at 1:10 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Tue, Dec 10, 2019 at 6:40 AM Suraj Kharage
> <suraj(dot)kharage(at)enterprisedb(dot)com> wrote:
> > Please find attached patch for backup validator implementation (0004
> patch). This patch is based
> > on Rushabh's latest patch for backup manifest.
> >
> > There are some functions required at client side as well, so I have
> moved those functions
> > and some data structure at common place so that they can be accessible
> for both. (0003 patch).
> >
> > My colleague Rajkumar Raghuwanshi has prepared the WIP patch (0005) for
> tap test cases which
> > is also attached. As of now, test cases related to the tablespace and
> tar backup format are missing,
> > will continue work on same and submit the complete patch.
> >
> > With this mail, I have attached the complete patch stack for backup
> manifest and backup
> > validate implementation.
> >
> > Please let me know your thoughts on the same.
>
> Well, for the second time on this thread, please don't take a bunch of
> somebody else's code and post it in a patch that doesn't attribute
> that person as one of the authors. For the second time on this thread,
> the person is me, but don't borrow *anyone's* code without proper
> attribution. It's really important!
>
> On a related note, it's a very good idea to use git format-patch and
> git rebase -i to maintain patch stacks like this. Rushabh seems to
> have done that, but the files you're posting look like raw 'git diff'
> output. Notice that this gives him a way to include authorship
> information and a tentative commit message in each patch, but you
> don't have any of that.
>

Sorry, I have corrected this in the attached v2 patch set.

> Also on a related note, part of the process of adapting existing code
> to a new purpose is adapting the comments. You haven't done that:
>
> + * Search a result-set hash table for a row matching a given filename.
> ...
> + * Insert a row into a result-set hash table, provided no such row is
> already
> ...
> + * Most of the values
> + * that we're hashing are short integers formatted as text, so there
> + * shouldn't be much room for pathological input.
>
Corrected in v2 patch.

> I think that what we should actually do here is try to use simplehash.
> Right now, it won't work for frontend code, but I posted some patches
> to try to address that issue:
>
>
> https://www.postgresql.org/message-id/CA%2BTgmob8oyh02NrZW%3DxCScB%2B5GyJ-jVowE3%2BTWTUmPF%3DFsGWTA%40mail.gmail.com
>
> That would have a few advantages. One, we wouldn't need to know the
> number of elements in advance, because simplehash can grow
> dynamically. Two, we could use the iteration interfaces to walk the
> hash table. Your solution to that is pgrhash_seq_search, but that's
> actually not well-designed, because it's not a generic iterator
> function but something that knows specifically about the 'touch' flag.
> I incidentally suggest renaming 'touch' to 'matched;' 'touch' is not
> bad, but I think 'matched' will be a little more recognizable.
>

Thanks for the suggestion. Will try to implement the same and update
accordingly.
I am assuming that I need to build the patch based on the changes that you
proposed on the mentioned thread.

> Please run pgindent. If needed, first add locally defined types to
> typedefs.list, so that things indent properly.
>
> It's not a crazy idea to try to share some data structures and code
> between the frontend and the backend here, but I think
> src/common/backup.c and src/include/common/backup.h is a far too
> generic name given what the code is actually doing. It's mostly about
> checksums, not backup, and I think it should be named accordingly. I
> suggest removing "manifestinfo" and renaming the rest to just talk
> about checksums rather than manifests. That would make it logical to
> reuse this for any other future code that needs a configurable
> checksum type. Also, how about adding a function like:
>
> extern bool parse_checksum_algorithm(char *name, ChecksumAlgorithm *algo);
>
> ...which would return true and set *algo if name is recognized, and
> return false otherwise. That code could be used on both the client and
> server sides of this patch, and by any future patches that want to
> return this scaffolding.
>

Corrected the filename and implemented the function as suggested.

> The file header for backup.h has the wrong filename (string.h). The
> header format looks somewhat atypical compared to what we normally do,
> too.

My bad, corrected the header format as well.

>
>
It's arguable, but I tend to think that it would be better to
> hex-encode the CRC rather than printing it as an integer. Maybe
> hex_encode() is another thing that could be moved into the new
> src/common file.

We are already encoding the CRC checksum as well. Please let me know if I
misunderstood anything.
Moved hex_encode into src/common.

> As I said before about Rushabh's patch set, it's very confusing that
> we have so many patches here stacked up. Like, you have 0002 moving
> stuff, and then 0003 moving it again. That's super-confusing. Please
> try to structure the patch set so as to make it as easy to review as
> possible.
>

Sorry for the confusion. I have squashed 0001 to 0003 patches in one patch.

> Regarding the test case patch, error checks are important! Don't do
> things like this:
>
> +open my $modify_file_sha256, '>>',
> "$tempdir/backup_verify/postgresql.conf";
> +print $modify_file_sha256 "port = 5555\n";
> +close $modify_file_sha256;
>
> If the open fails, then it and the print and the close are going to
> silently do nothing. That's bad. I don't know exactly what the
> customary error-checking is for things like this in TAP tests, but I
> hope it's not like this, because this has a pretty fair chance of
> looking like it's testing something that it isn't. Let's figure out
> what the best practice in this area is and adhere to it.
>

Rajkumar has fixed this, please find attached 0003 patch for same.

Please find attached v2 set patches.

TODO: will implement the simplehash as suggested.

> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

--
--

Thanks & Regards,
Suraj kharage,
EnterpriseDB Corporation,
The Postgres Database Company.

Attachment Content-Type Size
v2-0001-Backup-manifest-with-file-names-sizes-timestamps-.patch application/octet-stream 39.8 KB
v2-0002-Implemenation-of-backup-validator.patch application/octet-stream 20.1 KB
v2-0003-Tap-test-case-patch-to-verify-the-backup-using-ve.patch application/octet-stream 7.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2019-12-12 12:40:06 Re: Wrong assert in TransactionGroupUpdateXidStatus
Previous Message Peter Eisentraut 2019-12-12 11:46:21 Unicode normalization SQL functions