From: | Suraj Kharage <suraj(dot)kharage(at)enterprisedb(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, Tels <nospam-pg-abuse(at)bloodgate(dot)com>, David Steele <david(at)pgmasters(dot)net>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, vignesh C <vignesh21(at)gmail(dot)com> |
Subject: | Re: backup manifests |
Date: | 2019-12-12 12:32:49 |
Message-ID: | CAF1DzPVeJdV3q4oiCBaTs5jMoUYkqNRicAYR-DAHuHGA7ccO=w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Thanks, Robert for the review.
On Wed, Dec 11, 2019 at 1:10 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Dec 10, 2019 at 6:40 AM Suraj Kharage
> <suraj(dot)kharage(at)enterprisedb(dot)com> wrote:
> > Please find attached patch for backup validator implementation (0004
> patch). This patch is based
> > on Rushabh's latest patch for backup manifest.
> >
> > There are some functions required at client side as well, so I have
> moved those functions
> > and some data structure at common place so that they can be accessible
> for both. (0003 patch).
> >
> > My colleague Rajkumar Raghuwanshi has prepared the WIP patch (0005) for
> tap test cases which
> > is also attached. As of now, test cases related to the tablespace and
> tar backup format are missing,
> > will continue work on same and submit the complete patch.
> >
> > With this mail, I have attached the complete patch stack for backup
> manifest and backup
> > validate implementation.
> >
> > Please let me know your thoughts on the same.
>
> Well, for the second time on this thread, please don't take a bunch of
> somebody else's code and post it in a patch that doesn't attribute
> that person as one of the authors. For the second time on this thread,
> the person is me, but don't borrow *anyone's* code without proper
> attribution. It's really important!
>
> On a related note, it's a very good idea to use git format-patch and
> git rebase -i to maintain patch stacks like this. Rushabh seems to
> have done that, but the files you're posting look like raw 'git diff'
> output. Notice that this gives him a way to include authorship
> information and a tentative commit message in each patch, but you
> don't have any of that.
>
Sorry, I have corrected this in the attached v2 patch set.
> Also on a related note, part of the process of adapting existing code
> to a new purpose is adapting the comments. You haven't done that:
>
> + * Search a result-set hash table for a row matching a given filename.
> ...
> + * Insert a row into a result-set hash table, provided no such row is
> already
> ...
> + * Most of the values
> + * that we're hashing are short integers formatted as text, so there
> + * shouldn't be much room for pathological input.
>
Corrected in v2 patch.
> I think that what we should actually do here is try to use simplehash.
> Right now, it won't work for frontend code, but I posted some patches
> to try to address that issue:
>
>
> https://www.postgresql.org/message-id/CA%2BTgmob8oyh02NrZW%3DxCScB%2B5GyJ-jVowE3%2BTWTUmPF%3DFsGWTA%40mail.gmail.com
>
> That would have a few advantages. One, we wouldn't need to know the
> number of elements in advance, because simplehash can grow
> dynamically. Two, we could use the iteration interfaces to walk the
> hash table. Your solution to that is pgrhash_seq_search, but that's
> actually not well-designed, because it's not a generic iterator
> function but something that knows specifically about the 'touch' flag.
> I incidentally suggest renaming 'touch' to 'matched;' 'touch' is not
> bad, but I think 'matched' will be a little more recognizable.
>
Thanks for the suggestion. Will try to implement the same and update
accordingly.
I am assuming that I need to build the patch based on the changes that you
proposed on the mentioned thread.
> Please run pgindent. If needed, first add locally defined types to
> typedefs.list, so that things indent properly.
>
> It's not a crazy idea to try to share some data structures and code
> between the frontend and the backend here, but I think
> src/common/backup.c and src/include/common/backup.h is a far too
> generic name given what the code is actually doing. It's mostly about
> checksums, not backup, and I think it should be named accordingly. I
> suggest removing "manifestinfo" and renaming the rest to just talk
> about checksums rather than manifests. That would make it logical to
> reuse this for any other future code that needs a configurable
> checksum type. Also, how about adding a function like:
>
> extern bool parse_checksum_algorithm(char *name, ChecksumAlgorithm *algo);
>
> ...which would return true and set *algo if name is recognized, and
> return false otherwise. That code could be used on both the client and
> server sides of this patch, and by any future patches that want to
> return this scaffolding.
>
Corrected the filename and implemented the function as suggested.
> The file header for backup.h has the wrong filename (string.h). The
> header format looks somewhat atypical compared to what we normally do,
> too.
My bad, corrected the header format as well.
>
>
It's arguable, but I tend to think that it would be better to
> hex-encode the CRC rather than printing it as an integer. Maybe
> hex_encode() is another thing that could be moved into the new
> src/common file.
We are already encoding the CRC checksum as well. Please let me know if I
misunderstood anything.
Moved hex_encode into src/common.
> As I said before about Rushabh's patch set, it's very confusing that
> we have so many patches here stacked up. Like, you have 0002 moving
> stuff, and then 0003 moving it again. That's super-confusing. Please
> try to structure the patch set so as to make it as easy to review as
> possible.
>
Sorry for the confusion. I have squashed 0001 to 0003 patches in one patch.
> Regarding the test case patch, error checks are important! Don't do
> things like this:
>
> +open my $modify_file_sha256, '>>',
> "$tempdir/backup_verify/postgresql.conf";
> +print $modify_file_sha256 "port = 5555\n";
> +close $modify_file_sha256;
>
> If the open fails, then it and the print and the close are going to
> silently do nothing. That's bad. I don't know exactly what the
> customary error-checking is for things like this in TAP tests, but I
> hope it's not like this, because this has a pretty fair chance of
> looking like it's testing something that it isn't. Let's figure out
> what the best practice in this area is and adhere to it.
>
Rajkumar has fixed this, please find attached 0003 patch for same.
Please find attached v2 set patches.
TODO: will implement the simplehash as suggested.
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
--
--
Thanks & Regards,
Suraj kharage,
EnterpriseDB Corporation,
The Postgres Database Company.
Attachment | Content-Type | Size |
---|---|---|
v2-0001-Backup-manifest-with-file-names-sizes-timestamps-.patch | application/octet-stream | 39.8 KB |
v2-0002-Implemenation-of-backup-validator.patch | application/octet-stream | 20.1 KB |
v2-0003-Tap-test-case-patch-to-verify-the-backup-using-ve.patch | application/octet-stream | 7.2 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2019-12-12 12:40:06 | Re: Wrong assert in TransactionGroupUpdateXidStatus |
Previous Message | Peter Eisentraut | 2019-12-12 11:46:21 | Unicode normalization SQL functions |