From: | Grega Bremec <grega(dot)bremec(at)noviforum(dot)si> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | Wildcard searches & performance question |
Date: | 2003-05-27 19:09:08 |
Message-ID: | 20030527210908.A571@elbereth.noviforum.si |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hello,
I have a database with the following layout:
searchterms=# \d+ searches_2002
Table "public.searches_2002"
Column | Type | Modifiers | Description
-----------+------------------------+-----------+-------------
srchdate | date | not null |
srchtime | time without time zone | not null |
client_ip | inet | not null |
srchquery | character varying(50) | not null |
fhvalue | smallint | |
Indexes: searches_2002_client_ip btree (client_ip),
searches_2002_srchdate btree (srchdate),
searches_2002_srchdatetime btree (srchdate, srchtime),
searches_2002_srchquery btree (srchquery),
searches_2002_srchquery_lcase btree (lower(srchquery)),
searches_2002_srchquery_withfh btree (srchquery, fhvalue),
searches_2002_srchtime btree (srchtime)
There are no uniqueness properties that would make it possible for this table
to have a primary key, as it is a list of searches performed on a search
engine and the users' behaviour is, well... umm, odd, to be mild. :)
Also, do note that this is a test table, so nevermind the excessive amount of
indexes - performance is not an issue here, I am still evaluating the need and
benefits of having various indexes on those columns.
The particular case I am interested is this: when executing queries involving
pattern searches using various operators on srchquery, none of the indexes are
used in certain cases, namely those LIKE and regexp filters that start with
a wildcard.
This makes perfect sense, because wildcard pattern searches that start with a
wildcard, can not really benefit from an index scan, because a sequential scan
is probably going to be faster: we are only going to benefit from scanning an
index in those special cases where the wildcard evaluates to a zero-length
string.
One example of a query plan:
searchterms=# explain select count(*)
from searches_2002
where srchquery like '%curriculum%';
QUERY PLAN
--------------------------------------------------------------------------
Aggregate (cost=4583.26..4583.26 rows=1 width=0)
-> Seq Scan on searches_2002 (cost=0.00..4583.26 rows=1 width=0)
Filter: (srchquery ~~ '%curriculum%'::text)
There is 211061 records in this test table, but the real-life tables would
contain a much much larger amount of data, more like 50+ million rows.
This promise of a hell on earth trying to optimize performance makes me wonder:
would there be a sensible way/reason for avoiding sequential scans on queries
that start with a wildcard, and would avoiding sequential scans even be
feasible in such cases? Or in other words, can I somehow optimize LIKE and
regexp queries that start with a wildcard?
TIA,
--
Grega Bremec
System Administration & Development Support
grega.bremec-at-noviforum.si
http://najdi.si/
http://www.noviforum.si/
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Sullivan | 2003-05-27 19:37:00 | Re: Wildcard searches & performance question |
Previous Message | Tom Lane | 2003-05-23 20:34:48 | Simplifying varchar and bpchar behavior |