Quick Links

Re: A costing analysis tool

From:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: A costing analysis tool
Date:	2005-10-17 21:21:55
Message-ID:	20051017212155.GU86144@pervasive.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, Oct 14, 2005 at 02:37:37PM -0400, Tom Lane wrote:
> "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
> > I propose capturing only three values from the output of explain
> > analyze, and saving it with many columns of context information.
>
> You really have to capture the rowcounts (est and actual) too.
> Otherwise you can't tell if it's a costing problem or a statistics
> problem.
>
> More generally, I think that depending entirely on EXPLAIN ANALYZE
> numbers is a bad idea, because the overhead of EXPLAIN ANALYZE is both
> significant and variable depending on the plan structure. The numbers
> that I think we must capture are the top-level EXPLAIN cost and the
> actual runtime of the query (*without* EXPLAIN). Those are the things
> we would like to get to track closely. EXPLAIN ANALYZE is incredibly
> valuable as context for such numbers, but it's not the thing we actually
> wish to optimize.

The problem with that is that we lose all data on per-node costs versus
what the planner thought should happen. ISTM it would be better to run
all 3 scenarios: explain, explain analyze, and select count(*). As for
the caching issue that raises, I don't buy into the theory that all
testing should be done with nothing in the cache, because it's entirely
unrealistic in most cases. I think what makes a lot more sense is to do
two runs, clearing the cache and swapping the order of analyze and
count(*) between the two. That would give us a complete set of data: not
only would we know how things break down at a node-by-note level, we'd
also know how caching affected things. Given some clever data-mining,
one could probably even produce cost estimates for the overhead of
explain analyze which could be factored into further analysis.

Of course the one downside is this doubles the amount of time it takes
for a test to run...
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

In response to

Re: A costing analysis tool at 2005-10-14 18:37:37 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Jim C. Nasby	2005-10-17 21:31:38	Re: A costing analysis tool
Previous Message	Jim C. Nasby	2005-10-17 21:17:14	Re: A costing analysis tool