From: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
---|---|
To: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: pg_trgm partial-match |
Date: | 2012-11-19 06:05:12 |
Message-ID: | CAPpHfdvuQPDgckJZWcgq=ggQGO1whvPQBKp+pEdzyW=4jVyK=Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi!
On Thu, Nov 15, 2012 at 11:39 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> Note that we cannot do a partial-match if KEEPONLYALNUM is disabled,
> i.e., if query key contains multibyte characters. In this case, byte
> length of
> the trigram string might be larger than three, and its CRC is used as a
> trigram key instead of the trigram string itself. Because of using CRC, we
> cannot do a partial-match. Attached patch extends pg_trgm so that it
> compares a partial-match query key only when KEEPONLYALNUM is
> enabled.
>
Didn't get this point. How does KEEPONLYALNUM guarantee that each trigram
character is singlebyte?
CREATE TABLE test (val TEXT);
INSERT INTO test VALUES ('aa'), ('aaa'), ('шaaш');
CREATE INDEX trgm_idx ON test USING gin (val gin_trgm_ops);
ANALYZE test;
test=# SELECT * FROM test WHERE val LIKE '%aa%';
val
------
aa
aaa
шaaш
(3 rows)
test=# set enable_seqscan = off;
SET
test=# SELECT * FROM test WHERE val LIKE '%aa%';
val
-----
aa
aaa
(2 rows)
I think we can use partial match only for singlebyte encodings. Or, at
most, in cases when all alpha-numeric characters are singlebyte (have no
idea how to check this).
------
With best regards,
Alexander Korotkov.
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2012-11-19 06:46:31 | Re: [PATCH 13/14] Introduce pg_receivellog, the pg_receivexlog equivalent for logical changes |
Previous Message | JiangGuiqing | 2012-11-19 06:03:31 | [PATCH] Patch to fix missing libecpg_compat.lib and libpgtypes.lib. |