Re: Creating an index-type for LIKE '%value%'

From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: CG <cgg007(at)yahoo(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Creating an index-type for LIKE '%value%'
Date: 2005-02-07 18:07:15
Message-ID: 20050207180710.GE20493@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, Feb 07, 2005 at 09:28:24AM -0800, CG wrote:
> As I was exploring ways to optimize my application's use of the database, which
> has to run the horrid "SELECT * FROM table WHERE field LIKE '%value%';" in
> places, I thought this solution could be built upon to allow for an easier
> deployment.

<snip>

> AFAICT, the "right" way to do this would be to create an index type which would
> take care of splitting the varchar field, and to have the query planner use the
> index in a SELECT when a "LIKE '%value%'" is used in the WHERE clause.
>
> Tsearch2 is fantastic, but it works best for fields that contain words. I have
> to sift through alphanumeric identification numbers.

Seems to me to depends quite a bit or your problem domain. How big are
the string's you're searching. If you're not searching on word
boundaries like tsearch, you'd need to split on every char. Say you
split on three character blocks. So the string "Hello World" would need
entries for:

"Hel", "ell", "llo", "lo ", "o W", " Wo", "Wor", "orl", "rld"

For N character strings you'd need N-2 entries. If you're storing
entire documents it's not practical. But if all your strings are maybe
15 characters long (maybe serial numbers), it might be practical.

I havn't looked at tsearch but maybe you can customise it to your
needs. If you can redefine the split function you could make it work
appropriately. Then you can define the ~~ operator (which is LIKE) to
call tsearch.

This in just off the top of my head, but maybe it can work.

Hope this helps,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2005-02-07 18:09:24 Re: Creating an index-type for LIKE '%value%'
Previous Message Howard Cole 2005-02-07 17:35:46 tsearch2 Trigger