From: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
---|---|
To: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: pg_trgm partial-match |
Date: | 2012-11-19 10:55:08 |
Message-ID: | CAPpHfdtTc2UqLXu98LMxPNdOhuxnSZrPPv2xv8i10SY+CGnaFg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Nov 19, 2012 at 10:05 AM, Alexander Korotkov
<aekorotkov(at)gmail(dot)com>wrote:
> On Thu, Nov 15, 2012 at 11:39 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>wrote:
>
>> Note that we cannot do a partial-match if KEEPONLYALNUM is disabled,
>> i.e., if query key contains multibyte characters. In this case, byte
>> length of
>> the trigram string might be larger than three, and its CRC is used as a
>> trigram key instead of the trigram string itself. Because of using CRC, we
>> cannot do a partial-match. Attached patch extends pg_trgm so that it
>> compares a partial-match query key only when KEEPONLYALNUM is
>> enabled.
>>
>
> Didn't get this point. How does KEEPONLYALNUM guarantee that each trigram
> character is singlebyte?
>
> CREATE TABLE test (val TEXT);
> INSERT INTO test VALUES ('aa'), ('aaa'), ('шaaш');
> CREATE INDEX trgm_idx ON test USING gin (val gin_trgm_ops);
> ANALYZE test;
> test=# SELECT * FROM test WHERE val LIKE '%aa%';
> val
> ------
> aa
> aaa
> шaaш
> (3 rows)
> test=# set enable_seqscan = off;
> SET
> test=# SELECT * FROM test WHERE val LIKE '%aa%';
> val
> -----
> aa
> aaa
> (2 rows)
>
> I think we can use partial match only for singlebyte encodings. Or, at
> most, in cases when all alpha-numeric characters are singlebyte (have no
> idea how to check this).
>
Actually, I also was fiddling around idea of partial match on trigrams when
I was working on initial LIKE patch. But, I concluded that we would need a
separate opclass which always keeps full trigram in entry.
------
With best regards,
Alexander Korotkov.
From | Date | Subject | |
---|---|---|---|
Next Message | Jeevan Chalke | 2012-11-19 10:59:51 | Re: too much pgbench init output |
Previous Message | Andres Freund | 2012-11-19 08:50:30 | Re: logical changeset generation v3 |