| From: | Stephen Frost <sfrost(at)snowman(dot)net> |
|---|---|
| To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
| Cc: | Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robins Tharakan <tharakan(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Fabrízio Mello <fabriziomello(at)gmail(dot)com>, David Fetter <david(at)fetter(dot)org>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> |
| Subject: | Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump |
| Date: | 2018-01-26 16:56:09 |
| Message-ID: | 20180126165609.GI2416@tamriel.snowman.net |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Robert,
* Robert Haas (robertmhaas(at)gmail(dot)com) wrote:
> On Fri, Jan 26, 2018 at 11:09 AM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> > * Robert Haas (robertmhaas(at)gmail(dot)com) wrote:
> >> On Thu, Jan 25, 2018 at 4:52 PM, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> >> > My proposal is that instead of looking at three hundred lines, you'd
> >> > look for 50 lines of `pg_restore -l` output -- is element XYZ in there
> >> > or not. Quite a bit simpler for the guy adding a new test. This tests
> >> > combinations of pg_dump switches: are we dumping the right set of
> >> > objects.
> >>
> >> My counter-proposal is that we remove the test entirely. It looks
> >> like an unmaintainable and undocumented mess to me, and I doubt
> >> whether the testing value is sufficient to justify the effort of
> >> updating it every time anyone wants to change something in pg_dump.
> >
> > Considering it turned up multiple serious bugs, particularly in the
> > binary upgrade path, I can't disagree more. If you have a counter
> > proposal which actually results in better test coverage, that'd be
> > fantastic, but I wholly reject the notion that we should be considering
> > reducing our test coverage of pg_dump.
>
> I figured you would, but it's still my opinion. I guess my basic
> objection here is to the idea that we somehow know that the 6000+ line
> test case file actually contains only correct tests. That vastly
> exceeds the ability of any normal human being to verify correctness,
> especially given what's already been said about the interdependencies
> between different parts of the file and the lack of adequate
> documentation.
I don't mean to discount your opinion at all, but I'm quite concerned
about the code coverage of pg_dump. I certainly agree that the testing
of pg_dump can and should be improved and I'd like to find time to work
on that, but simply throwing this out strikes me as a step backwards,
not forwards, even for its difficulty.
> For example, I notice that the file contains 166 copies of
> only_dump_test_schema => 1 (with 4 different variations as to how
> much whitespace is included). Some of those are in the "like" clause
> and some are in the "unlike" clause. If one of them were misplaced,
> and pg_dump is actually producing the wrong output, the two errors
> would cancel out and I suspect nobody would notice. If somebody makes
> a change to pg_dump that changes which things get dumped when --schema
> is used, then a lot of those lines would need to moved around. That
> would be a huge nuisance. If the author of such a patch just stuck
> those lines where they make the tests pass, they might fail to notice
> if one of them were actually in the wrong place, and a bug would go
> undiscovered. Some people probably have the mental stamina to audit
> 166 separate cases and make sure that each one is properly positioned,
> but some people might not; and that's really just one test of many.
> Really, every pg_dump change that alters behavior needs to
> individually reconsider the position of every one of ~6000 lines in
> this file, and nobody is really going to do that.
>
> There's some rule that articulates what the effect of --schema is
> supposed to be. I don't know the exact rule, but it's probably
> something like "global objects shouldn't get dumped and schema objects
> should only get dumped if they're in that schema". That rule ought to
> be encoded into this file in some recognizable form. It's encoded
> there, but only in the positioning of hundreds of separate lines of
> code that look identical but must be positioned differently based on a
> human programmer's interpretation of how that rule applies to each
> object type. That's not a good way of implementing the rule; nobody
> can look at this and say "oh, well I changed the rules for --schema,
> so here's which things need to be updated". You're not going to be
> able to look at the diff somebody produces and have any idea whether
> it's right, or at least not without a lot of very time-consuming
> manual verification. If you had just saved the output of pg_dump in a
> file (maybe with OIDs and other variable bits stripped out) and
> compared the new output against the old, it would give you just as
> much code coverage but probably be a lot easier to maintain. If you
> had implemented the rules programmatically instead of encoding a giant
> manually precomputed data structure in the test case file it would be
> a lot more clearly correct and again easier to maintain.
The existing code already has some of these type of rules encoded in it,
specifically to reduce the number of like/unlike cases where it's
possible to do so- see the discussion of catch-all tests, described on
about line 282. While there may be other such rules which can be
encoded, I don't believe they'd end up being nearly as clean as you're
suggesting unless we go through and make changes to pg_dump itself to
consistently behave according to those rules. Perhaps that's a way to
go, I'll admit that I was focusing more on making sure that the
documented behavior is what pg_dump was actually doing, and so I hadn't
considered how we could simplify the testing if pg_dump was stipulated
to operate in a more specific manner.
The notion of using the TOC and then splitting up the output as
suggested by Alvaro seems like a good approach to me. Perhaps there's a
way to do that and to also encode more rules about how pg_dump is
expected to operate.
> I think you've chosen a terrible design and ought to throw the whole
> thing away and start over.
I'll all for throwing away the existing test once we've got something
that covers at least what it does (ideally more, of course).
Thanks!
Stephen
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 2018-01-26 17:09:51 | Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump |
| Previous Message | Robert Haas | 2018-01-26 16:38:29 | Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump |