Re: Creating an index-type for LIKE '%value%'

From: Yury Don <yura(at)vpcit(dot)ru>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Creating an index-type for LIKE '%value%'
Date: 2005-02-07 18:16:30
Message-ID: 1915284578.20050207231630@vpcit.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello CG,

Monday, February 7, 2005, 10:28:24 PM, you wrote:

C> Return-Path: <pgsql-general-owner+M73162(at)postgresql(dot)org>
C> Delivered-To: yura(at)vpcit(dot)ru
C> Received: (qmail 15486 invoked from network); 7 Feb 2005 17:36:10 -0000
C> Received: from svr4.postgresql.org (66.98.251.159)
C> by ns.vpcit.ru with SMTP; 7 Feb 2005 17:36:09 -0000
C> Received: from postgresql.org (svr1.postgresql.org [200.46.204.71])
C> by svr4.postgresql.org (Postfix) with ESMTP id 91E355AFB96;
C> Mon, 7 Feb 2005 17:35:38 +0000 (GMT)
C> X-Original-To:
C> pgsql-general-postgresql(dot)org(at)localhost(dot)postgresql(dot)org
C> Received: from localhost (unknown [200.46.204.144])
C> by svr1.postgresql.org (Postfix) with ESMTP id AB6DB8B9C8E
C> for
C> <pgsql-general-postgresql(dot)org(at)localhost(dot)postgresql(dot)org>; Mon, 7
C> Feb 2005 17:28:41 +0000 (GMT)
C> Received: from svr1.postgresql.org ([200.46.204.71])
C> by localhost (av.hub.org [200.46.204.144]) (amavisd-new, port 10024)
C> with ESMTP id 86703-06
C> for <pgsql-general-postgresql(dot)org(at)localhost(dot)postgresql(dot)org>;
C> Mon, 7 Feb 2005 17:28:27 +0000 (GMT)
C> Received: from web13811.mail.yahoo.com (web13811.mail.yahoo.com [216.136.175.219])
C> by svr1.postgresql.org (Postfix) with SMTP id 282268B9B41
C> for <pgsql-general(at)postgresql(dot)org>; Mon, 7 Feb 2005 17:28:26 +0000 (GMT)
C> Received: (qmail 27996 invoked by uid 60001); 7 Feb 2005 17:28:25 -0000
C> Comment: DomainKeys? See http://antispam.yahoo.com/domainkeys
C> DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws;
C> s=s1024; d=yahoo.com;
C>
C> b=RGAPPVsUjH1PXAVx5YgEkzrYoDPXlWw1QqdzAqR8VkgnmIBdcEWfH8poGpOiBJZd3dDCXObCkh9PoSlR0m1B5BaGO1hPPVDY5Ypl3NLL3lwhAhLaEGhHT25sPztygaIZUyNbYalfrQZLjOl7P3ZSTu7uqsiaqrI56gAntgyCZIQ=
C> Message-ID: <20050207172824(dot)27994(dot)qmail(at)web13811(dot)mail(dot)yahoo(dot)com>
C> Received: from [216.173.173.66] by web13811.mail.yahoo.com via
C> HTTP; Mon, 07 Feb 2005 09:28:24 PST
C> Date: Mon, 7 Feb 2005 09:28:24 -0800 (PST)
C> From: CG <cgg007(at)yahoo(dot)com>
C> Reply-To: cgg007(at)yahoo(dot)com
C> Subject: [GENERAL] Creating an index-type for LIKE '%value%'
C> To: pgsql-general(at)postgresql(dot)org
C> In-Reply-To: <5349(dot)1107713905(at)sss(dot)pgh(dot)pa(dot)us>
C> MIME-Version: 1.0
C> Content-Type: text/plain; charset=us-ascii
C> X-Virus-Scanned: by amavisd-new at hub.org
C> X-Spam-Status: No, hits=0.89 tagged_above=0 required=5
C> tests=DNS_FROM_RFC_ABUSE, FROM_ENDS_IN_NUMS
C> X-Spam-Level:
C> X-Mailing-List: pgsql-general
C> Precedence: bulk
C> Sender: pgsql-general-owner(at)postgresql(dot)org

C> Once upon a time there was an FTI contrib module that split up a varchar field
C> into little bits and placed them into an FTI table to facilitate a full text
C> index search. It was like being able to do a "SELECT * FROM table WHERE field
C> LIKE '%value%';" and have it search an index!

C> It was a great idea! What a pain it was to implement!

C> You see, a trigger had to be run on INSERT and UPDATE to split up the varchar
C> field into little pieces. On DELETE you'd have to clear out the rows from the
C> FTI table. And when you wanted to use the FTI table in a SELECT you had to
C> write your SQL to join up that FTI table and dig through it.

C> As I was exploring ways to optimize my application's use of the database, which
C> has to run the horrid "SELECT * FROM table WHERE field LIKE '%value%';" in
C> places, I thought this solution could be built upon to allow for an easier
C> deployment.

C> AFAICT, the "right" way to do this would be to create an index type which would
C> take care of splitting the varchar field, and to have the query planner use the
C> index in a SELECT when a "LIKE '%value%'" is used in the WHERE clause.

C> Tsearch2 is fantastic, but it works best for fields that contain words. I have
C> to sift through alphanumeric identification numbers.

C> Is the split-field FTI the best way to tackle my problem?

C> What can I do to get better performance on "SELECT * FROM table WHERE field
C> LIKE '%value%';" ??

C> CG

We use for this type ltree from contrib.
For example you have a table with column named f1. Add a column
f1_ltree of type ltree and fill it in trigger taking value of f1 and
cracting tree with every character as node. Create index for table on
f2 using gist. For example, for f1='abcde' f2 will be 'a.b.c.d.e'. Below
is example of function for transforming text to ltree. And then you
can search "f2 ~ '*.b.c.d.*'" instead of "f1 like '%bcd%'" and it will be
index search. It's possible also not to create additional column and
create index on charsltree(f1) and search as "charsltree(f1) ~
'*.b.c.d.*'"

create or replace function charsltree(text) returns ltree as '
declare
str alias for $1;
res text;
i smallint;
begin
if $1 is null or $1 = '''' then
return null;
end if;
res = '''';
for i in 1 .. length(str) loop
res = res || substr(str, i, 1) || ''.'';
end loop;
return text2ltree(btrim(res, ''.''));
end;
' immutable language plpgsql;

--
Best regards,
Yury mailto:yura(at)vpcit(dot)ru

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Karl O. Pinc 2005-02-07 18:18:09 Permissions on implicit SERIAL datatype seqeuences
Previous Message Tom Lane 2005-02-07 18:09:24 Re: Creating an index-type for LIKE '%value%'