Re: the big picture for index-only scans

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: the big picture for index-only scans
Date: 2011-05-11 14:17:58
Message-ID: 201105111417.p4BEHwl27045@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Simon Riggs wrote:
> On Wed, May 11, 2011 at 1:47 AM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> > Greg Stark wrote:
> >> On a separate note though, Simon, I don't know what you mean by "we
> >> normally start with a problem". It's an free software project and
> >> people are free to work on whatever interests them whether that's
> >> because it solves a problem they have, helps a client who's paying
> >> them, or just because it's of academic interest to them. We don't
> >> always take their patches if they aren't of general interest but
> >> people propose all kinds of crazy experimental ideas all the time.
> >
> > I am confused by Simon's questions too.
> >
> > Simon seems to regularly argue for adding features late in the
> > development cycle and backpatch things no one else thinks should be
> > backpatched, but he wants more research that index-only scans are going
> > to improve things before it is implemented? ? The first is aggressive
> > development, the second is very conservative development --- they don't
> > match, so I now wonder what the motivation is since it isn't consistent.
>
> Not really sure why reasonable technical skepticism should become
> personal commentary.
>
> You don't question Tom's motives if he is skeptical of an idea of
> mine. Why would you question my motivation? What is *your* motive for
> acting like that?

Tom is consistent in his level of aggressive/conservative development
suggestions. What I am seeing are many cases where you are consistently
pushing for something even though you get almost-overwhelming rejection,
and you keep going. And if it was consistent in one direction, I could
understand because maybe you feel we are too conservative, but if it
isn't consistent, I have no idea how to learn or adjust to your
approach. We clearly have some people on one side of the
conservative/agressive specturm, and some on the other side.

Now, I am willing to admit I might be totally wrong, but it has risen to
a level that I felt I should say something in case it is helpful.

> I'm not driven by one setting of "conservatism", but I am interested
> in adding fully usable features that bring credit to the project. If I
> see a feature that can have minor things added to it to improve them,
> then I raise that during beta. If I see things being worked out that
> sounds dubious, I mention that in early development.

Yes, that seems fine to me, as stated.

> I don't think this work will materially improve the speed of count(*)
> in majority of cases. This introduces extra overhead into the code

I think this is the only hope we have of improving count(*) in an active
MVCC system. It might not work, but it has been our only hope of
improvement of count(*) for a while.

> path and that can be a net loss. The only time it will help is when
> you have a large table that is not cached and also not recently
> updated. Is count(*) run very often against such tables? Do we really
> care enough to optimise that use case with lots of special purpose
> code? The very fact that Kevin and yourself bring up different reasons
> for why we need this feature makes me nervous.

Yes, no question. For count(*), you don't care about the indexed
values, only the count, while for Kevin's case you are reading values
from the index. I assume (or hope) that one or both will be a win for
this feature.

> The analysis has not been done yet, and all I have done is request that.

I think we are going to have to write the code and see the performance
hit and where it is a win. Ideally we could figure this out
before-hand, but I don't think that is possible in this case. If you
look at the research in reducing the load of updating the hint bits,
again, it is so complex that only working code and testing is showing if
there is possible improvement there.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2011-05-11 14:22:01 Re: Re: 4.1beta1: ANYARRAY disallowed for DOMAIN types which happen to be arrays
Previous Message nil nil 2011-05-11 14:10:55 Help: regarding patch development