From: | Mischa Sandberg <ischamay(dot)andbergsay(at)activestateway(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Indexed views? |
Date: | 2004-09-12 21:54:05 |
Message-ID: | 1A31d.194860$X12.99998@edtnps84 |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Greg Stark wrote:
> Mischa Sandberg <ischamay(dot)andbergsay(at)activestateway(dot)com> writes:
>>I take it that it is a very reasonable assumption that only a small proportion
>>of index records are actually invalid (else Yurk why use the index?).
> That's faulty logic, the percentage of tuples that are valid is entirely
> independent from the percentage of tuples that match your range criterion. Ie,
> I could be selecting 100 tuples out of a million -- even if 99 are invalid
> it's still worthwhile to use the index.
Ummm ... perhaps I glossed over a bit of inference. If only a small
proportion of the index contains invalid row references, then (in the
absence of specific biases otherwise) arbitrary queries using that index
will average the same proportion of invalid row references. And agreed,
it would still be worthwhile to use the index in that case. Your analyze
stats would be a bit queered, though.
>>Since you're using an index at all, the planner must be expecting a restricted
>>set of rows to make it up through to the root. If there is any filter criteria
>>against the values from the index rows, you won't even have to check rows for
>>tuple visibility, that don't pass that filter.
>
> It's an interesting idea though. But I can't think of many queries where it
> would be interesting. The query would still have to visit every page
> containing a record used in the final result. So the only time this would be a
> significant win is if you're applying very selective restrictions to columns
> that were in the index but weren't able to put in the index condition.
>
> This seems like a pretty rare situation; usually the reason you put columns in
> an index definition is because it is going to be useful for index conditions--
> especially if it's a particularly selective column.
Ummm ... two situations where filters on index columns do not fit the
standard index probe are:
- filtering by restrictive join. Whether the index is the source or
target of restriction, you get a better choice of join operators/orders.
For example, if the index is the restrictor, you may be able to build a
hash table of (filtered) index rows, where building a hash table from a
heap scan would be a bad choice.
- filtering of non-root index fields, or filtering with inequalities.
Normally, the planner will not bother with the index for these, and may
do a serial scan of the table. This can be done with a serial scan of
the index, with possible optimizations like Oracle's "skip scan".
Furthermore, what 'covering' indexes buy you is, they have all the data
you need for the query results, whether you apply predicates to them all
or not.
At another level, people are talking about decomposition storage models
for data in disk pages, versus n-ary storage models. That's more or less
a fancy way of saying, organize data by columns instead of groups. This
storage model pays you back in CPU cycles on most computers with L1/L2
cache splits. At such point as PG might consider moving to that, then
the row validity columns would be grouped together in a page, and the
verification of index rows would be significantly faster: only a small
portion of a large page need be read and pushed through the CPU.
[For more on DSM vs NSM, google: NSM n-ary DSM ]
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2004-09-12 22:01:31 | Re: pgxs default installation + various fixes |
Previous Message | Gaetano Mendola | 2004-09-12 21:22:52 | Re: x86_64 configure problem |