Re: A costing analysis tool

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers(at)postgresql(dot)org
Subject: Re: A costing analysis tool
Date: 2005-10-16 21:22:23
Message-ID: 200510161422.23958.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Kevin,

> I have to keep a very narrow focus on this, or there is likely that
> nothing will come of it.  The particular area which is my target
> here is the accuracy of the cost values on the subplans
> considered by the optimizer.

Sure. What the rest of us are focused on is helping you build a generally
useful tool which can be used to solve future problems and unexpected
performance issues as well. I really see needing to collect all of the
information possible from EXPLAIN ANALYZE ... not collecting just three bits
and throwing the rest of the stuff away.

I'd use a structure more like:

query_instances (
query text
run_instance int
estimated_cost float
analyze_time float
actual_time float
time_run timestamp
GUCs text[]
)

query_steps (
query_instance FK
step_id SERIAL
parent_step FK
node_name text
node_type text FK
cost_start float
cost_end float
est_rows INT8
time_start float
time_end float
actual_rows INT8
loops INT8
db_object name
condition_type text FK
condition_detail text
)

so, for example, the query step:
"                    ->  Seq Scan on detail0009  (cost=0.00..20500.11
rows=26 width=1005) (actual time=453.000..5983.000 rows=53588 loops=1)"
"                          Filter: ((txcontenttype ~~* '%html%'::text)
AND ((vchost)::text ~~* '%www.%'::text))"

Could be displayed as:

query_instance 12008
step_id 14701
parent_step 14698
node_name Seq Scan on detail0009
node_type Seq Scan
cost_start 0
cost_end 20500.11
est_rows 26
time_start 453.0
time_end 5983.0
actual_rows 53588
loops 1
db_object detail009
condition_type Filter
condition_detail ((txcontenttype ~~* '%html%'::text) AND ((vchost)::text ~~*
'%www.%'::text))

By collecting all of this data, you make it possible to perform other sorts of
analysis on the cost estimates. For example, statistical analysis might
tell us that 3-or-more-condition filters take significantly longer to execute
than single-condition filters, which would be important to know for the cost
model. Limiting it to collecting only 3 of the 13 bits of node data produced
by EA would very much limit the usefulness of the tool and the reliability of
its statistics.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2005-10-16 21:23:41 Re: A costing analysis tool
Previous Message Greg Stark 2005-10-16 21:09:57 Re: slow IN() clause for many cases