Re: table/index options | was: COUNT(*) and index-only scans

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: table/index options | was: COUNT(*) and index-only scans
Date: 2011-10-11 11:00:49
Message-ID: CA+TgmoY_nhypU7o_6QoiMSoY-m6qHYc1goWuUZMJWKXdFRCxVQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 10, 2011 at 3:16 PM, Cédric Villemain
<cedric(dot)villemain(dot)debian(at)gmail(dot)com> wrote:
> 2011/10/10 Robert Haas <robertmhaas(at)gmail(dot)com>:
>> On Mon, Oct 10, 2011 at 1:36 PM, Kevin Grittner
>> <Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
>>>
>>> That gives you an index-only scan; but without the WHERE clause it
>>> uses a seq scan.  I think it's mainly a matter of doing enough
>>> benchmarks to figure out how best to model the costs of the index
>>> scan so that it can be picked for that case.
>>
>> Right now, our costing model for index-only scans is pretty dumb.  It
>> assumes that using an index-only scan will avoid 10% of the heap
>> fetches.  That could easily be low, and on an insert-only table or one
>> where only the recently-updated rows are routinely accessed, it could
>> also be high.  To use an index-only scan for a full-table COUNT(*),
>> we're going to have to be significantly smarter, because odds are good
>> that skipping 10% of the heap fetches won't be sufficient inducement
>> to the planner to go that route; we are going to need a real number.
>
> I have to raise that I think we are going to face the exact same issue
> with the visibility_fraction that we face with the hack to set
> random_page_cost very low to help optimizer (when index/table is
> mostly in cache).
>
> 4 options have been viewed so far:
> 1. pg_class (initial proposal to store the cache estimates)
> 2. pg_class_nt (revived by Alvaro IIRC)
> 3. reloption
> 4. GUC (by Tom for visibility_fraction)
>
> I am in favor of 1 or 2, 4 is a backup option, and 3 an open door to
> planner hint (others also let DBA use its knowledge if he wants, but 3
> make it mandatory for the DBA to decide, and no automatic way can be
> used to update it, except if someone make ALTER TABLE lock free)
>
> (It does not prevent a cost_indexonly() to be written meawhile...)
>
> What do you think/prefer/suggest ?

Well, I think a GUC is kind of useless, because you're going to want
to make this per-table.

As to the rest, I think they're all going to have the same problems -
or non-problems - with ALTER TABLE locking the full table. If that's
a show-stopper, we should try to fix it. But how to do that is a
topic for another thread.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Royce Ausburn 2011-10-11 11:08:27 Index only scan paving the way for "auto" clustered tables?
Previous Message Robert Haas 2011-10-11 10:37:43 Re: COUNT(*) and index-only scans