Re: A costing analysis tool

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>, josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: A costing analysis tool
Date: 2005-10-15 22:37:13
Message-ID: 87zmpae7ae.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Hardly --- how will you choose the best subplans if you don't calculate
> their costs?

Uhm, good point. but as you say not really a problem.

> I'm also a bit suspicious of the "it's all a linear equation" premise,
> because the fact of the matter is that the cost estimates are already
> nonlinear, and are likely to get more so rather than less so as we learn
> more. A case in point is that the reason nestloop costing sucks so
> badly at the moment is that it fails to account for cache effects in
> repeated scans ... which is definitely a nonlinear effect.

That only means the relationship between the estimates for the outside of the
nested loop to the estimates inside the loop isn't simple. But the individual
estimates for both nodes are still just linear equations themselves. That is,
the actual cost for each node is just the result of a simple linear equation
of all the parameters estimated at that node.

I think they *have* to be linear equations. If not the units can't work out
properly. Every operation takes time, and the total amount of time spent in
the query is just the sum of all the time spent in those operations. There
just aren't very many operations that make much sense on measures of time
after all.

In fact I wonder if breaking out the individual parameters would offer a way
out of the nested loop knot. If you know how much time is the plan inside the
nested loop is estimated to spend in index lookups specifically you could
discount that but keep the other parameters at full value.

Or something like that. It might require breaking random_page_cost into two or
three different parameters that would normally have the same cost but aren't
handled the same, like random_heap_cost, random_leaf_cost, and
random_nonleaf_cost.

--
greg

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2005-10-15 22:45:22 Re: slow IN() clause for many cases
Previous Message Martijn van Oosterhout 2005-10-15 22:34:18 Re: A costing analysis tool