From: | Moritz Onken <onken(at)houseofdesign(dot)de> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | Planner should use index on a LIKE 'foo%' query |
Date: | 2008-06-28 16:24:42 |
Message-ID: | 287E4428-BECC-46D6-BC47-EA94EA0121F0@houseofdesign.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hi,
I have a query
select count(*)
from result
where exists
(select * from item where item.url LIKE result.url || '%' limit 1);
which basically returns the number of items which exist in table
result and match a URL in table item by its prefix.
I read all about idexes (http://www.postgresql.org/docs/8.3/static/indexes-types.html
) and especially this part:
"The optimizer can also use a B-tree index for queries involving the
pattern matching operators LIKE and ~ if the pattern is a constant and
is anchored to the beginning of the string — for example, col LIKE 'foo
%' or col ~ '^foo', but not col LIKE '%bar'."
Since my server does not use the C locale I created the index with
CREATE INDEX test_index
ON item
USING btree
(url varchar_pattern_ops);
which works fine for queries like
SELECT distinct url from item where url like 'http://www.micro%'
limit 10;
explain analyze shows:
"Limit (cost=9.53..9.54 rows=1 width=34) (actual time=80.809..80.856
rows=10 loops=1)"
" -> Unique (cost=9.53..9.54 rows=1 width=34) (actual
time=80.806..80.835 rows=10 loops=1)"
" -> Sort (cost=9.53..9.53 rows=1 width=34) (actual
time=80.802..80.812 rows=11 loops=1)"
" Sort Key: url"
" Sort Method: quicksort Memory: 306kB"
" -> Index Scan using test_index on item
(cost=0.00..9.52 rows=1 width=34) (actual time=0.030..6.165 rows=2254
loops=1)"
" Index Cond: (((url)::text ~>=~ 'http://
www.micro'::text) AND ((url)::text ~<~ 'http://www.micrp'::text))"
" Filter: ((url)::text ~~ 'http://www.micro%'::text)"
"Total runtime: 80.908 ms"
which is great but if I run the query with the subselect it uses a
sequence scan:
select *
from result
where exists
(select * from item where item.url LIKE result.url || '%' limit 1)
limit 10;
"Limit (cost=0.00..96.58 rows=10 width=36) (actual
time=12.660..35295.928 rows=10 loops=1)"
" -> Seq Scan on result (cost=0.00..93886121.77 rows=9721314
width=36) (actual time=12.657..35295.906 rows=10 loops=1)"
" Filter: (subplan)"
" SubPlan"
" -> Limit (cost=0.00..4.81 rows=1 width=42) (actual
time=2715.061..2715.061 rows=1 loops=13)"
" -> Seq Scan on item (cost=0.00..109589.49
rows=22781 width=42) (actual time=2715.055..2715.055 rows=1 loops=13)"
" Filter: ((url)::text ~~ (($0)::text ||
'%'::text))"
"Total runtime: 35295.994 ms"
The only explaination is that I don't use a constant when comparing
the values. But actually it is a constant...
any help?
using postgres 8.3.3 on ubuntu.
Cheers,
moritz
From | Date | Subject | |
---|---|---|---|
Next Message | Steinar H. Gunderson | 2008-06-28 19:19:31 | Re: Planner should use index on a LIKE 'foo%' query |
Previous Message | Tom Lane | 2008-06-28 15:53:43 | Re: Subquery WHERE IN or WHERE EXISTS faster? |