From: | "Vladimir Sitnikov" <sitnikov(dot)vladimir(at)gmail(dot)com> |
---|---|
To: | "Craig Ringer" <craig(at)postnewspapers(dot)com(dot)au> |
Cc: | Lutischán Ferenc <lutischanf(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Improve Seq scan performance |
Date: | 2008-11-10 08:18:12 |
Message-ID: | 1d709ecc0811100018o5058377bl8fd7b2ddd7cba96a@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
>
> Maybe there's some hybrid type possible where you can scan the index to
> find large table regions that are known /not/ to contain tuples of interest
> and seek over them in your scan. I wouldn't know, really, but it sounds like
> it'd probably be more I/O than a pure seq scan (given the reading of the
> index too) unless the table had the values of interest rather neatly
> clustered. It'd also surely use more memory and CPU time processing the
> whole index to find table regions without values of interest.
>
> Is that what you meant, though?
Not exactly. I mean the following: there are cases when index scan even
over non-clustered values is a complete win (basically, it is a win when the
number of returned values is relatively small no matter is it due to
selectivity or due to limit clause).
The test case that I have provided creates a 667 pages long table and 30
pages long index thus a complete scan of the index is 22 times faster in
terms of I/O.
Suppose you want to find all the values that contain '%123%'. Currently
PostgreSQL will do a sec scan, while the better option might be (and it is)
to loop through all the items in the index (it will cost 30 I/O), find
records that truly contain %123% (it will find 20 of them) and do 20 I/O to
check tuple visiblity. That is 50 I/O versus 667 for seq scan.
> A b-tree index cannot be used on a LIKE query with a leading wildcard. See
> the FAQ.
Unfortunately it is true. I would love to improve that particular case.
In addition, if your database is not in the C locale you can't use an
> ordinary index for LIKE queries. See the FAQ. You need to create a
> text_pattern_ops index instead:
>
> create index i_ix_txt on seq_test(i text_pattern_ops);
Good catch. However, that does not change the results. PostgresSQL does the
same amount of 2529 I/O for index scan on '%123%' for some unknown reason.
>
>
> set enable_seqscan=off
>> -- Index Scan reads 2529 pages for some reason. I would expect *30 *(index
>> size) + *20 *(number of matching entries) = 50 pages maximum, that is 10
>> times better than with seq scan.
>> Index Scan using i_ix on seq_test (cost=0.00..1643.74 rows=356 width=508)
>> (actual time=0.334..16.746 rows=*20 *loops=1 read_shared=2529(2529)
>> read_local=0(0) flush=0 local_flush=0 file_read=0 file_write=0)
>> Filter: (i ~~ '%123%'::text)
>> Total runtime: 16.863 ms
>>
>
> I think it's reading the whole index, because it can't do a prefix search
> if there's a leading wildcard. I'm a bit confused, though, since I thought
> in this case it couldn't actually execute the query w/o a sequential scan,
> and would just use one irrespective of the enable_seqscan param. That's what
> happens here.
Please, follow the case carefully: the index is only 30 pages long. Why is
PostgreSQL doing 2529 I/O? It drives me crazy.
Regards,
Vladimir Sitnikov
From | Date | Subject | |
---|---|---|---|
Next Message | Jim 'Decibel!' Nasby | 2008-11-10 08:27:01 | Oddity with view |
Previous Message | Craig Ringer | 2008-11-10 07:59:45 | Re: Improve Seq scan performance |