From: | Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Bruce Momjian <bruce(at)momjian(dot)us>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | Re: the big picture for index-only scans |
Date: | 2011-06-19 19:13:49 |
Message-ID: | BANLkTi=DV5UtxFhCmDkM3E-a6u8aC+WJ_Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
2011/6/19 Robert Haas <robertmhaas(at)gmail(dot)com>:
> On Sun, Jun 19, 2011 at 11:12 AM, Cédric Villemain
> <cedric(dot)villemain(dot)debian(at)gmail(dot)com> wrote:
>>> Good point --- we would be making plan decisions based on the visibility
>>> map coverage. The big question is whether visibility map changes are
>>> more dynamic than the values we already plan against, like rows in the
>>> table, table size, and value distributions. I don't know the answer.
>>
>> Robert, I though of Covered-Index as just a usage of the vis map:
>> don't take the heap block if not needed. This look easier to do and
>> better in the long term (because read-only DB may quickly turn into a
>> no-heap access DB for example). Thus this is not real covered-index.
>> Did you want to implement real covered-index and did you have ideas on
>> how to do that ? Or just an optimization of the current
>> planner/executor on index usage ?
>
> If by a "real" covered index you mean one that includes visibility
> info in the index - I have no plans to work on anything like that. If
> we were to do that, the index would become much larger and less
> efficient, whereas the approach of just optimizing the way our
> existing indexes are used doesn't have that disadvantage. It also
> sounds like a lot of work. Now, if someone else wants to demonstrate
> that it has advantages that are worth the costs and go do it, more
> power to said person, but I'm unexcited about it.
Yes I was thinking of that, and agree with you.
>
>> I don't know VM internals:
>>
>> * do we have a counter of ALL_VISIBLE flag set on a relation ? (this
>> should be very good for planner)
>> * do we need a pg_class.rel_vmvisible ?! (I have hands up, don't
>> shoot pleeaase)
>
> Evidently I'm developing a more frightening reputation than I would hope. :-(
Nah, I was joking :) don't worry !
Probably because I have already proposing 1 new GUC and at least one
new column to pg_class recently. (and we're not used to change that
frequently)
>
> Anyway, yes, I do believe we need a table-level statistic for the
> percentage of the visibility map bits that are believed to be set.
> Having said that I think we need it, let me also say that I'm a bit
> skeptical about how well it will work. There are two problems:
>
> 1. Consider a query like "SELECT a, b FROM foo WHERE a = 1". To
> accurately estimate the cost of executing this query via an index-only
> scan (on an index over foo (a, b)), we need to know (i) the percentage
> of rows in the table for which a = 1 and (ii) the percentage *of those
> rows* which are on an all-visible page. We can assume that if 80% of
> the rows in the table are on all-visible pages, then 80% of the rows
> returned by this query will be on all-visible pages also, but that
> might be wildly wrong. This is similar to the problem of costing
> "SELECT * FROM foo WHERE a = 1 AND b = 1" - we know the fraction of
> rows where a = 1 and the fraction where b = 1, but there's no
> certainty that multiplying those values will produce an accurate
> estimate for the conjunction of those conditions. The problem here is
> not as bad as the general multi-column statistics problem because a
> mistake will only bollix the cost, not the row count estimate, but
> it's still not very nice.
>
> 2. Since VACUUM and ANALYZE often run together, we will be estimating
> the percentage of rows on all-visible pages just at the time when that
> percentage is highest. This is not exactly wonderful, either...
>
> I have a fair amount of hope that even with these problems we can come
> up with some adjustment to the planner that is better than just
> ignoring the problem, but I am not sure how difficult it will be.
>
>> * is it ok to parse VM for planning (I believe it is not) ?
>
> It doesn't seem like a good idea to me, but I just work here. I'm not
> sure what that would buy us.
All true, and I won't be unhappy to have the feature as a bonus, not
expected by the planner(for the cost part) but handled by the
executor.
--
Cédric Villemain 2ndQuadrant
http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support
From | Date | Subject | |
---|---|---|---|
Next Message | Florian Pflug | 2011-06-19 19:26:43 | Re: Adding a distinct "pattern" type to resolve the "~" commutator stalemate |
Previous Message | Robert Haas | 2011-06-19 18:56:43 | Re: Adding a distinct "pattern" type to resolve the "~" commutator stalemate |