Re: Improve Seq scan performance

From: "Vladimir Sitnikov" <sitnikov(dot)vladimir(at)gmail(dot)com>
To: "Craig Ringer" <craig(at)postnewspapers(dot)com(dot)au>
Cc: Lutischán Ferenc <lutischanf(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Improve Seq scan performance
Date: 2008-11-10 08:18:12
Message-ID: 1d709ecc0811100018o5058377bl8fd7b2ddd7cba96a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

>
> Maybe there's some hybrid type possible where you can scan the index to
> find large table regions that are known /not/ to contain tuples of interest
> and seek over them in your scan. I wouldn't know, really, but it sounds like
> it'd probably be more I/O than a pure seq scan (given the reading of the
> index too) unless the table had the values of interest rather neatly
> clustered. It'd also surely use more memory and CPU time processing the
> whole index to find table regions without values of interest.

>
> Is that what you meant, though?

Not exactly. I mean the following: there are cases when index scan even
over non-clustered values is a complete win (basically, it is a win when the
number of returned values is relatively small no matter is it due to
selectivity or due to limit clause).
The test case that I have provided creates a 667 pages long table and 30
pages long index thus a complete scan of the index is 22 times faster in
terms of I/O.

Suppose you want to find all the values that contain '%123%'. Currently
PostgreSQL will do a sec scan, while the better option might be (and it is)
to loop through all the items in the index (it will cost 30 I/O), find
records that truly contain %123% (it will find 20 of them) and do 20 I/O to
check tuple visiblity. That is 50 I/O versus 667 for seq scan.

> A b-tree index cannot be used on a LIKE query with a leading wildcard. See
> the FAQ.

Unfortunately it is true. I would love to improve that particular case.

In addition, if your database is not in the C locale you can't use an
> ordinary index for LIKE queries. See the FAQ. You need to create a
> text_pattern_ops index instead:
>
> create index i_ix_txt on seq_test(i text_pattern_ops);

Good catch. However, that does not change the results. PostgresSQL does the
same amount of 2529 I/O for index scan on '%123%' for some unknown reason.

>
>
> set enable_seqscan=off
>> -- Index Scan reads 2529 pages for some reason. I would expect *30 *(index
>> size) + *20 *(number of matching entries) = 50 pages maximum, that is 10
>> times better than with seq scan.
>> Index Scan using i_ix on seq_test (cost=0.00..1643.74 rows=356 width=508)
>> (actual time=0.334..16.746 rows=*20 *loops=1 read_shared=2529(2529)
>> read_local=0(0) flush=0 local_flush=0 file_read=0 file_write=0)
>> Filter: (i ~~ '%123%'::text)
>> Total runtime: 16.863 ms
>>
>
> I think it's reading the whole index, because it can't do a prefix search
> if there's a leading wildcard. I'm a bit confused, though, since I thought
> in this case it couldn't actually execute the query w/o a sequential scan,
> and would just use one irrespective of the enable_seqscan param. That's what
> happens here.

Please, follow the case carefully: the index is only 30 pages long. Why is
PostgreSQL doing 2529 I/O? It drives me crazy.

Regards,
Vladimir Sitnikov

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Jim 'Decibel!' Nasby 2008-11-10 08:27:01 Oddity with view
Previous Message Craig Ringer 2008-11-10 07:59:45 Re: Improve Seq scan performance