Re: the big picture for index-only scans

From: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: the big picture for index-only scans
Date: 2011-05-11 09:51:20
Message-ID: BANLkTimeUEv0MZvGEC8jzotjCpZL9KqZqA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2011/5/11 Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>:
> On 10.05.2011 20:15, Simon Riggs wrote:
>>
>> On Tue, May 10, 2011 at 5:17 PM, Kevin Grittner
>> <Kevin(dot)Grittner(at)wicourts(dot)gov>  wrote:
>>>
>>> Simon Riggs<simon(at)2ndQuadrant(dot)com>  wrote:
>>>
>>>> This topic has been discussed many times, yet I have never seen an
>>>> assessment that explains WHY we would want to do index-only scans.
>>>
>>> In databases with this feature, it's not too unusual for a query
>>> which uses just an index to run one or more orders of magnitude
>>> faster than a query which has to randomly access the heap for each
>>> index entry.  That seems like enough evidence of its possible value
>>> in PostgreSQL to proceed to the point where benchmarks become
>>> possible.  I'm assuming that, like all other features added as
>>> performance optimizations, it won't be committed until there are
>>> benchmarks showing the net benefit.
>>>
>>> As a thought experiment, picture the relative costs of scanning a
>>> portion of an index in index sequence, and being done, versus
>>> scanning a portion of an index in index sequence and jumping to a
>>> random heap access for each index entry as you go.
>>
>> I can picture that. Regrettably, I can also picture the accesses to
>> the visibility map, the maintenance operations on the VM that are
>> needed for this and the contention that both of those will cause.
>
> Note that we already have the visibility map, and the accesses needed to
> update it are already there. Granted, we'll have to change the logic
> slightly to make it crash safe, but I don't expect that to add any
> meaningful overhead - the changes are going to be where the bits are set,
> ie. vacuum, not when the bits are cleared. Granted, we might also want to
> set the bits more aggressively once they're used by index-only-scans. But
> done correctly, just taking advantage of the VM that's already there
> shouldn't add overhead to other operations.

We won't be able to do index-only scan.
We can do index scan with probability to not access heap,
maybe(hopefully) completely in some cases.

IF vis map is ok to remove the need to access heap (perf and safe),
then, for the cost part:
Currently, cost_index materialize the cost to access each heap page by
a random_page_cost. I believe we should be able to change that to
remove the estimated number of heap page we don't need to access (can
be 100% or 0.1%).

And as suggested Simon, there is also maybe a path to improve the
bitmapheap scan. bitmapheap scan have already some workaround to be
sure indexscan looks cheaper in some case, just keep that and apply
same logic than for cost_index.

This is keeping the same rule PostgreSQL has : let the planner decide
the best solution instead of allowing special index declaration (it
hasn't been propose yet I think, but, well, just in case it pops into
the mind of someone)

>
> I agree that we need to do tests to demonstrate that there's a gain from the
> patch, once we have a patch to test. I would be very surprised if there
> isn't, but that just means the testing is going to be easy.
>
> --
>  Heikki Linnakangas
>  EnterpriseDB   http://www.enterprisedb.com
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2011-05-11 10:06:06 Re: potential bug in trigger with boolean params
Previous Message Noah Misch 2011-05-11 09:32:17 Re: 4.1beta1: ANYARRAY disallowed for DOMAIN types which happen to be arrays