Re: Planner enhancement suggestion.

From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: PFC <lists(at)peufeu(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Planner enhancement suggestion.
Date: 2006-03-07 18:37:15
Message-ID: 20060307183715.GU82989@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Tue, Mar 07, 2006 at 07:09:15PM +0100, PFC wrote:
>
> >The problem is that you're now talking about doing 2 index scans instead
> >of just one and a sort.
>
> It depends on what you call an index scan :
> a- Scanning just the index (no heap lookup) to create a bitmap

Sure, and then you scan the other index and read the heap at the same
time (b). Your plan requires doing both. The question is: in what cases
will it be faster to scan the extra index and build the bitmap vs. just
doing a sort.

> b- Scanning the index and hitting the heap in index order to
> retrieve the rows
>
> (a) should be quite fast, because indexes generally use less space
> than the main table, and have good locality of reference. (b) is OK if the
> table fits in memory, but if it has to seek on every row from the heap...

If the table fits in memory, who cares? A sort should be damn fast at
that point, because you're dealing with a small set of data.

> So, when doing :
> SELECT * FROM products WHERE category=C ORDER BY price LIMIT 20;
>
> If the category contains few products, using the index on category
> then sorting is good.
> However, if the category contains many items, postgres is likely to
> use the index on price to avoid the sort. It needs to lose time fetching

Have you actually seen this behavior? My experience is that you have to
have a correlation somewhere over 80-90% before an index scan is favored
over a seqscan + sort (which as I mentioned before appears to be
broken).

> The bitmap trick I proposed in my previous post would be even more
> interesting if the table is clustered on category (which seems a
> reasonable thing to do).

In which case it's highly unlikely that using the price index will buy
you anything.

> Does the cost estimator know about this kind of correlation ?

Yes. The problem is that the index scan cost estimator figures out a
best and worst case cost, and then interpolates between the two using
correlation^2. IMO it should be using abs(correlation) to do this, and
there's some data at http://stats.distributed.net/~decibel/ that backs
this up. There's also been some discussions on -hackers (search the
archives for "index cost correlation nasby"), but I've not had time to
follow up on this. If you wanted to test a new index cost formula it
would be a one line change to the code.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Jim C. Nasby 2006-03-07 18:43:27 Re: t1000/t2000 sun-servers
Previous Message PFC 2006-03-07 18:09:15 Re: Planner enhancement suggestion.