From: | "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com> |
---|---|
To: | Guillaume Cottenceau <gc(at)mnc(dot)ch> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: planner with index scan cost way off actual cost, advices to tweak cost constants? |
Date: | 2006-03-21 11:58:35 |
Message-ID: | 20060321115835.GN15742@pervasive.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On Tue, Mar 21, 2006 at 11:13:06AM +0100, Guillaume Cottenceau wrote:
> "Jim C. Nasby" <jnasby 'at' pervasive.com> writes:
>
> [...]
>
> > > My point is that the planner's cost estimate is way above the
> > > actual cost of the query, so the planner doesn't use the best
> > > plan. Even if the index returns so much rows, actual cost of the
> > > query is so that index scan (worst case, all disk cache flushed)
> > > is still better than seq scan but the planner uses seq scan.
> >
> > Yes. The cost estimator for an index scan supposedly does a linear
> > interpolation between a minimum cost and a maximum cost depending on the
> > correlation of the first field in the index. The problem is that while
> > the comment states it's a linear interpolation, the actual formula
> > squares the correlation before interpolating. This means that unless the
> > correlation is very high, you're going to get an unrealistically high
> > cost for an index scan. I have data that supports this at
> > http://stats.distributed.net/~decibel/, but I've never been able to get
> > around to testing a patch to see if it improves things.
>
> Interesting.
>
> It would be nice to investigate the arguments behind the choice
> you describe for the formula used to perform the interpolation. I
> have absolutely no knowledge on pg internals so this is rather
> new/fresh for me, I have no idea how smart that choice is (but
> based on my general feeling about pg, I'm suspecting this is
> actually smart but I am not smart enough to see why ;p).
If you feel like running some tests, you need to change
run_cost += max_IO_cost + csquared * (min_IO_cost - max_IO_cost);
in src/backend/optimizer/path/costsize.c to something like
run_cost += max_IO_cost + abs(indexCorrelation) * (min_IO_cost - max_IO_cost);
That might not produce a perfect cost estimate, but I'll wager that it
will be substantially better than what's in there now. FYI, see also
http://archives.postgresql.org/pgsql-performance/2005-04/msg00669.php
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
From | Date | Subject | |
---|---|---|---|
Next Message | Csaba Nagy | 2006-03-21 11:59:13 | Re: Migration study, step 1: bulk write performance |
Previous Message | Steinar H. Gunderson | 2006-03-21 11:56:18 | Re: Migration study, step 1: bulk write performance |