Quick Links

Re: LIKE search and performance

From:	"Alexander Staubo" <alex(at)purefiction(dot)net>
To:	Andy <frum(at)ar-sd(dot)net>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: LIKE search and performance
Date:	2007-05-23 16:05:26
Message-ID:	88daf38c0705230905u1a981102k691b47bd36011a80@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

On 5/23/07, Andy <frum(at)ar-sd(dot)net> wrote:
> An example would be:
> SELECT * FROM table
> WHERE name like '%john%' or street like '%srt%'
>
> Anyway, the query planner always does seq scan on the whole table and that
> takes some time. How can this be optimized or made in another way to be
> faster?

There's no algorithm in existence that can "index" arbitrary
substrings the way you think. The only rational way to accomplish this
is to first break the text into substrings using some algorithm (eg.,
words delimited by whitespace and punctuation), and index the
substrings individually.

You can do this using vanilla PostgreSQL, and you can use Tsearch2
and/or its GIN indexes to help speed up the searches. The simplest
solution would be to put all the substrings in a table, one row per
substring, along with an attribute referencing the source table/row.

At this point you have effectively reduced your search space: you can
use a query to isolate the words in your "dictionary" that contain the
substrings. So a query might be:

select ... from persons where id in (
select person_id from person_words
where word like '%john%';
)

The "like" search, even though it will use a sequential scan, is bound
to be much faster on these small words than searching for substrings
through large gobs of text in the persons table.

Note that PostgreSQL *can* exploit the index for *prefix* matching, if
you tell it to use the right operator class:

create index persons_name_index on persons (name text_pattern_ops);

or, if you're using varchars (though there is rarely any reason to):

create index persons_name_index on persons (name varchar_pattern_ops);

(These two may be identical internally. Check the docs.)

Now you can do:

select ... from persons where name like 'john%';

which will yield a query plan such as this:

Index Scan using persons_name_index on persons (cost=0.00..8.27
rows=1 width=29) (actual time=0.184..0.373 rows=51 loops=1)
Index Cond: ((name ~>=~ 'john'::text) AND (name ~<~ 'joho'::text))
Filter: (title ~~ 'john%'::text)

Alexander.

In response to

LIKE search and performance at 2007-05-23 15:08:49 from Andy

Responses

Re: LIKE search and performance at 2007-05-24 18:50:29 from James Mansion

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Y Sidhu	2007-05-23 16:22:22	max_fsm_pages, shared_buffers and checkpoint_segments
Previous Message	Guido Neitzer	2007-05-23 16:00:18	Re: LIKE search and performance