Re: pg_trgm: unicode string not working

From: Florian Pflug <fgp(at)phlo(dot)org>
To: sushant354(at)gmail(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_trgm: unicode string not working
Date: 2011-06-12 12:40:06
Message-ID: 0A0622C3-C778-4C6A-9345-0A6D81929BCF@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi

Next time, please post questions regarding the usage of postgres
to the -general list, not to -hackers. The purpose of -hackers is
to discuss the development of postgres proper, not the development
of applications using postgres.

On Jun12, 2011, at 13:33 , Sushant Sinha wrote:
> I am using pg_trgm for spelling correction as prescribed in the
> documentation. But I see that it does not work for unicode sring. The
> database was initialized with utf8 encoding and the C locale.

I think you need to use a locale (more precisely, a CTYPE) in which
'क', 'त', 'द' are considered to be alphanumeric.

You can specify the CTYPE when creating the database with
CREATE DATABASE ... LC_CTYPE = ...

> Here is the table:
> \d words
> Table "public.words"
> Column | Type | Modifiers
> --------+---------+-----------
> word | text |
> ndoc | integer |
> nentry | integer |
> Indexes:
> "words_idx" gin (word gin_trgm_ops)
>
> Query: select word from words where word % 'कतद';
>
> I get an error:
>
> ERROR: GIN indexes do not support whole-index scans

pg_trgm probably ignores non-alphanumeric characters during
comparison, so you end up with an empty search string, which
translates to a whole-index scan. Postgres up to 9.0 does
not support such scans for GIN indices.

Note that this restriction was removed in postgres 9.1 which
is currently in beta. However, GIT indices must be re-created
with REINDEX after upgrading from 9.0 to leverage that
improvement.

best regards.
Florian Pflug

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2011-06-12 14:12:36 Re: Creating new remote branch in git?
Previous Message Florian Pflug 2011-06-12 11:53:08 Re: Range Types and extensions