Re: Behavior of a pg_trgm index for 2 (or < 3) character LIKE queries

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Amit Langote <amitlangote09(at)gmail(dot)com>
Cc: Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Behavior of a pg_trgm index for 2 (or < 3) character LIKE queries
Date: 2013-05-31 17:53:04
Message-ID: CAPpHfdtFuaootX48f1kbLeMcuYOo16JBpO94agKCqF8VJMngLA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, May 31, 2013 at 9:41 PM, Amit Langote <amitlangote09(at)gmail(dot)com>wrote:

> On Sat, Jun 1, 2013 at 1:48 AM, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>
> wrote:
> > On Fri, May 31, 2013 at 11:16 AM, Amit Langote <amitlangote09(at)gmail(dot)com>
> wrote:
> >> 2) And if that is so, is there problem in gin_extract_query_trgm(),
> >> that is while generating trigrams from a query search term that causes
> >> trigrams (stored in the index if answer to (1) is yes) NOT to be used
> >> in such a partial matching case?
> >
> > it means that we can't use trigrams in case of partial matching
> > because trigrams (stored in index) are converted to different
> > value(CRC).
> > right?
> >
>
> When I debugged a partial match case such as " column like '%st%'
> ", it appears that get_wildcard_trigrams return no trigrams for
> wildcard part 'st' since charlen < 3. Hence, GIN_SEARCH_MODE_ALL mode
> is used and results in full index scan instead of trigrams being
> used. This happens for multibyte case too. The problem is that for
> wildcard part consisting of less than 3 characters,
> get_wildcard_trigrams return nothing.
>

Partial match for LIKE queries is not implemented at all. With current
index structure it could be possible to implement it with guarantee that
all indexed strings don't contain multibyte characters (i.e. single-byte
encoding).
Also, at it was mentioned, it's possible to implement operator class with
text storage type which would support partial match in all the cases.
However, I doubt it's reasonable to implement it based on pg_trgm, because
of many hard-wired assumptions that trigram is fixed length and
compatibility.

------
With best regards,
Alexander Korotkov.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2013-05-31 18:00:19 Re: removing PD_ALL_VISIBLE
Previous Message Heikki Linnakangas 2013-05-31 17:46:42 Re: detecting binary backup in progress