table/index options | was: COUNT(*) and index-only scans

From: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: table/index options | was: COUNT(*) and index-only scans
Date: 2011-10-10 19:16:10
Message-ID: CAF6yO=1fepY2x9Ec60sqthmk4RPrZJ5JBW=Wm5jQFZQAPFk7OA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2011/10/10 Robert Haas <robertmhaas(at)gmail(dot)com>:
> On Mon, Oct 10, 2011 at 1:36 PM, Kevin Grittner
> <Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
>>
>> That gives you an index-only scan; but without the WHERE clause it
>> uses a seq scan.  I think it's mainly a matter of doing enough
>> benchmarks to figure out how best to model the costs of the index
>> scan so that it can be picked for that case.
>
> Right now, our costing model for index-only scans is pretty dumb.  It
> assumes that using an index-only scan will avoid 10% of the heap
> fetches.  That could easily be low, and on an insert-only table or one
> where only the recently-updated rows are routinely accessed, it could
> also be high.  To use an index-only scan for a full-table COUNT(*),
> we're going to have to be significantly smarter, because odds are good
> that skipping 10% of the heap fetches won't be sufficient inducement
> to the planner to go that route; we are going to need a real number.

I have to raise that I think we are going to face the exact same issue
with the visibility_fraction that we face with the hack to set
random_page_cost very low to help optimizer (when index/table is
mostly in cache).

4 options have been viewed so far:
1. pg_class (initial proposal to store the cache estimates)
2. pg_class_nt (revived by Alvaro IIRC)
3. reloption
4. GUC (by Tom for visibility_fraction)

I am in favor of 1 or 2, 4 is a backup option, and 3 an open door to
planner hint (others also let DBA use its knowledge if he wants, but 3
make it mandatory for the DBA to decide, and no automatic way can be
used to update it, except if someone make ALTER TABLE lock free)

(It does not prevent a cost_indexonly() to be written meawhile...)

What do you think/prefer/suggest ?

--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2011-10-10 19:18:23 Re: COUNT(*) and index-only scans
Previous Message Kevin Grittner 2011-10-10 19:15:07 Re: COUNT(*) and index-only scans