Quick Links

Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

From:	Stephen Frost <sfrost(at)snowman(dot)net>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robins Tharakan <tharakan(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Fabrízio Mello <fabriziomello(at)gmail(dot)com>, David Fetter <david(at)fetter(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Subject:	Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump
Date:	2018-01-25 23:57:24
Message-ID:	20180125235724.GY2416@tamriel.snowman.net
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Alvaro,

* Alvaro Herrera (alvherre(at)alvh(dot)no-ip(dot)org) wrote:
> Stephen Frost wrote:
> > Alvaro, Tom,
> >
> > * Alvaro Herrera (alvherre(at)alvh(dot)no-ip(dot)org) wrote:
>
> > > Crazy idea: maybe a large fraction of that test could be replaced with
> > > comparisons of the "pg_restore -l" output file rather than pg_dump's
> > > text output (i.e. the TOC entry for each object, rather than the
> > > object's textual representation.) Sounds easier in code than current
> > > implementation. Separately, verify that textual representation for each
> > > TOC entry type is what we expect.
> >
> > I'm not sure how that's different..? We do check that the textual
> > representation is what we expect, both directly from pg_dump and from
> > pg_restore output, and using the exact same code to check both (which I
> > generally think is a good thing since we want the results from both to
> > more-or-less match up). What you're proposing here sounds like we're
> > throwing that away in favor of keeping all the same code to test the
> > textual representation but then only checking for listed contents from
> > pg_restore instead of checking that the textual representation is
> > correct. That doesn't actually reduce the amount of code though..
>
> Well, the current implementation compares a dozen of pg_dump output text
> files, three hundred lines apiece, against a thousand of regexes (give
> or take). Whenever there is a mismatch, what you get is "this regexp
> failed to match <three hundred lines>" (or sometimes "matched when it
> should have not"), so looking for the mismatch is quite annoying.

Yeah, the big output isn't fun, I agree with that.

> My proposal is that instead of looking at three hundred lines, you'd
> look for 50 lines of `pg_restore -l` output -- is element XYZ in there
> or not. Quite a bit simpler for the guy adding a new test. This tests
> combinations of pg_dump switches: are we dumping the right set of
> objects.

That might be possible to do, though I'm not sure that it'd really be
only 50 lines.

> *Separately* test that each individual TOC entry type ("table data",
> "index", "tablespace") is dumped as this or that SQL command, where you
> create a separate dump file for each object type. So you match a single
> TOC entry to a dozen lines of SQL, half of them comments -- pretty easy
> to see where a mismatch is.

Yikes. If you're thinking that we would call pg_restore independently
for each object, I'd be worried about the runtime increasing quite a
bit. There's also a few cases where we check dependencies between
objects that we'd need to be mindful of, though those we might be able
to take care of, if we can keep the runtime down. Certainly, part of
the way the existing code works was specifically to try and minimize the
runtime. Perhaps the approach taken of minimizing the invocations and
building a bunch of regexps to run against the resulting output is more
expensive than running pg_restore a bunch of times, but I'd want to see
that to be sure as some really simple timings locally on a 6.5k -Fc dump
file seems to take 30-40ms just to produce the TOC.

This also doesn't really help with the big perl hash, but these two
ideas don't seem to conflict, so perhaps we could possibly still tackle
the big perl hash with the approach I described up-thread and also find
a way to implement this, if it doesn't cause the regression tests to be
too much slower.

Thanks!

Stephen

In response to

Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump at 2018-01-25 21:52:47 from Alvaro Herrera

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tsunakawa, Takayuki	2018-01-26 00:08:17	RE: [HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory
Previous Message	Andres Freund	2018-01-25 23:51:55	Should we introduce a "really strict" volatility about return values?