From: | "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> |
---|---|
To: | <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: A costing analysis tool |
Date: | 2005-10-13 06:30:44 |
Message-ID: | s34db8d4.059@gwmta.wicourts.gov |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Good points, Tom. (I wish my client's email software supported
quoting so that I could post replies closer to your points. Sorry
'bout that.)
I tried searching the archives, though, and the words I could think
to search with generated so many hits that it seemed more or less
like a sequential search of the archives, which is daunting. If you
have any particular references, suggestions for search strings I might
have missed, or even a time range when you think it was discussed,
I'll gladly go looking again. I'm not out to reinvent the wheel, lever,
or any other basic idea.
To cover the "database fits in RAM" situation, we could load some
data, run test cases twice, using only the info from the second run,
and never flush. Then we could load more data and get on to the
cases where not everything is cached. I don't think we can get
huge -- these tests have to run in a reasonable amount of time, but
I hope we can load enough to get the major scaling effects covered.
So far my wildest dreams have not gone beyond a few simple math
operations to get to a cost estimate. Only testing will tell, but I
don't think it will be significant compared to the other things going
on in the planner. (Especially if I can compensate by talking you into
letting me drop that ceil function on the basis that without it we're
getting the statistical average of the possible actual costs.) It's even
possible that more accurate costing of the current alternatives will
reduce the need for other, more expensive, optimizer
enhancements. (That glass is half FULL, I SWEAR it!)
How do you establish that a cost estimate is completely out of line
with reality except by comparing its runtime/estimate ratio with
others? Unless you're saying not to look at just the summary level,
in which case I totally agree -- any one subplan which has an
unusual ratio in either direction needs to be examined. If you're
getting at something else, please elaborate -- I don't want to miss
anything.
Thanks for your response.
-Kevin
>>> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> 10/13/05 12:01 AM >>>
"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
> Note that I'm talking about a tool strictly to check the accuracy of
> the estimated costs of plans chosen by the planner, nothing else.
We could definitely do with some infrastructure for testing this.
I concur with Bruce's suggestion that you should comb the archives
for previous discussions --- but if you can work on it, great!
> (2) A large database must be created for these tests, since many
> issues don't show up in small tables. The same data must be generated
> in every database, so results are comparable and reproducable.
Reproducibility is way harder than it might seem at first glance.
What's worse, the obvious techniques for creating reproducible numbers
amount to eliminating variables that are important in the real world.
(One of which is size of database --- some people care about
performance of DBs that fit comfortably in RAM...)
Realistically, the planner is never going to have complete information.
We need to design planning models that generally get the right answer,
but are not so complicated that they are (a) impossible to maintain
or (b) take huge amounts of time to compute. (We're already getting
some flak on the time the planner takes.) So there is plenty of need
for engineering compromise here. Still, you can't engineer without
raw data, so I'm all for creating a tool that lets us gather real-world
cost data.
The only concrete suggestion I have at the moment is to not design the
tool directly around "measure the ratio of real time to cost". That's
only meaningful if the planner's cost model is already basically correct
and you are just in need of correcting the cost multipliers. What we
need for the near term is ways of quantifying cases where the cost
models are just completely out of line with reality.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Kevin Grittner | 2005-10-13 06:57:31 | Re: A costing analysis tool |
Previous Message | Neil Conway | 2005-10-13 06:18:35 | Re: pgsql: Do all accesses to shared buffer headers |