Re: A costing analysis tool

From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: A costing analysis tool
Date: 2005-10-13 19:41:11
Message-ID: 20051013194110.GD6080@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 13, 2005 at 01:52:10PM -0500, Kevin Grittner wrote:
> Thanks, Josh, for the feedback.
>
> It sounds as though you are more focused on picking up costing
> problems which happen during production -- which is clearly
> valuable, but addresses a somewhat different set of needs than
> I was looking at. That said, it seems like there is potential to share
> signifcant code between the two techniques. We'll have to see if
> we can work that out.

Firstly, I really hope you get further with this than I did a while ago
when I attempted. It's certainly a worthly goal.

Secondly, while checking for problems in productions systems is good,
it's not going to help with fixing the cost model. For that you need
raw data.

My basic plan was to setup tables of different sizes and attempt to run
queries such as:

- Index Scan on each table with different types of keys and coverage.
- Seq Scan
- Nested loop, etc...

I did reach the point where I was wishing I could just give PostgreSQL
the plan and tell it to execute it. :) The point of the exercise is to
be able to derive correlations so you could from the plan calcuate the
actual costs. For example, run a nested loop with an inner index scan
once, twice, three times etc so we can actually *see* what the cache
effects are.

I got stuck on working out how to force the optimiser to produce the
plan I want. I didn't try too hard though. The enable_xxx options
should be enough, hopefully. Ofcourse you want to run it with different
numbers of shared buffers to see how they affect the results.

And then you ideally want the results for several different machines,
different disk subsystems, memory types, etc and placed on a nice web
page so other people can run correlations on the data themselves.

This is essentially what you already came up with. Note that for these
purposes the actual estimates by PostgreSQL are irrelevent. However, I
strongly suggest finding a way of collating the results publically from
lots of people because digging for correlations is something lots of
people can hammer on and is really hard to program.

Hope this helps,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Martijn van Oosterhout 2005-10-13 20:06:45 Re: pg_config --pgxs on Win32
Previous Message Tom Lane 2005-10-13 19:41:08 Re: pg_config --pgxs on Win32