Behavior of a pg_trgm index for 2 (or < 3) character LIKE queries

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Behavior of a pg_trgm index for 2 (or < 3) character LIKE queries
Date: 2013-05-30 02:51:27
Message-ID: CA+HiwqH7F1xZAq7KeRGOzRYBruiMb++wMQfxxXPLMiAv_wLJkg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

I have been trying to understand how pg_trgm works. As part of that, I
was looking at gin_extract_query_trgm(), which I think, extracts
trigrams from a search query string. So, I debugged for 3 cases:

1) column_name LIKE '%緊急%'

in this case, inside gin_extract_query_trgm(), after a call to
generate_wildcard_trgm(), returned trglen is 0, hence
GIN_SEARCH_MODE_ALL search mode is used.

2) column_name LIKE '%os%'

same as in case (1)

3) column_name LIKE '%ost%'

returned trglen is > 0, things proceed differently. May be, trigrams
have been generated and cane be used for index search.

I later commented out #define KEEPONLYALNUM from
contrib/pg_trgm/trgm.h (following from a related discussion on
-hackers viz. http://www.postgresql.org/message-id/flat/CAHGQGwFJshvV2nGME19wdTW9teFw_w7h2ns4E+YYsjkB9WdWDQ(at)mail(dot)gmail(dot)com#CAHGQGwFJshvV2nGME19wdTW9teFw_w7h2ns4E+YYsjkB9WdWDQ@mail.gmail.com),
but things didn't change.

So, it appears, for search strings consisting of 2 (or < 3)
characters, trigrams can not be utilized. No?

NOTE: Using the master branch. The indexed column is a text field and
data consists of mix of Japanese, alphanumeric characters.

--
Amit Langote

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2013-05-30 03:01:31 Re: all_visible replay aborting due to uninitialized pages
Previous Message Robert Haas 2013-05-30 02:46:58 Re: removing PD_ALL_VISIBLE