Re: unaccent

From: nngodinh(at)tiscali(dot)it
To: "Oleg Bartunov" <oleg(at)sai(dot)msu(dot)su>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: unaccent
Date: 2002-09-18 12:37:40
Message-ID: 3D6DC6360002008F@mail-1.tiscalinet.it
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The best way to use it is quite simple. If you want to index the table "titles"
and "title" is the field containing the text to be indexed, you can create
another unaccented field, for instance "utitle".

UPDATE titles SET utitle = unac(title);

Of course you can set it up as a trigger function. Then you can use utitle
with txt2txtidx and tsearch.

Another solution is to generate the txtidx field (i.e. titleidx) directly
using unac:

UPDATE titles SET titleidx = txt2txtidx(unac(title));

But the problem is that I've not succeeded using it with tsearch because
(of course) it doesn't allow functions as parameters. So my first idea was
to integrate unac in tsearch.

Bye.

>-- Messaggio Originale --
>Date: Wed, 18 Sep 2002 15:08:59 +0300 (GMT)
>From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
>To: nngodinh(at)tiscali(dot)it
>Cc: pgsql-hackers(at)postgresql(dot)org
>Subject: Re: [HACKERS] unaccent
>
>
>On Wed, 18 Sep 2002 nngodinh(at)tiscali(dot)it wrote:
>
>> Greetings,
>>
>> As far as I use the txtidx data structure in conjunction with gist indexing
>> to make a word indexing of a very large UNICODE db, I've implemented
a
>PostgreSQL
>> function that uses libunac to unaccent TEXT fileds.
>>
>> The resulting text is in UTF-8, but you can modify it in the sources
with
>> an appropriate value (using iconv charset names).
>>
>> Get libunac from: http://www.nongnu.org/unac/ (it uses iconv)
>>
>> Extract the archive, compile it (make). Move pg_unac.so to your postgresql
>> shared libraries dir.
>>
>> Link it in postgresql:
>>
>> CREATE FUNCTION unac(TEXT) RETURNS TEXT AS 'path_to_pg_unac.so' LANGUAGE
>> C;
>>
>> What about integrating unaccent libraries directly in tsearch? It is
useful
>> for french search engines (for instance).
>
>I think better to have separate module contrib/unac and document using
>it with tsearch. Please write us a couple of lines about using
>your function and we'll add them into tsearch documentation.
>
>btw, use palloc instead of malloc in postgresql functions .
>
>>
>> Bye.
>>
>> Nhan NGO DINH
>>
>>
>> __________________________________________________________________
>> Tiscali Ricaricasa
>> la prima prepagata per navigare in Internet a meno di un'urbana e
>> risparmiare su tutte le tue telefonate. Acquistala on line e non avrai
>> nessun costo di attivazione n? di ricarica!
>> http://ricaricasaonline.tiscali.it/
>>
>>
>>
>>
>
> Regards,
> Oleg
>_____________________________________________________________
>Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
>Sternberg Astronomical Institute, Moscow University (Russia)
>Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
>phone: +007(095)939-16-83, +007(095)939-23-83
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 3: if posting/reading through Usenet, please send an appropriate
>subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
>message can get through to the mailing list cleanly

__________________________________________________________________
Tiscali Ricaricasa
la prima prepagata per navigare in Internet a meno di un'urbana e
risparmiare su tutte le tue telefonate. Acquistala on line e non avrai
nessun costo di attivazione né di ricarica!
http://ricaricasaonline.tiscali.it/

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message nngodinh 2002-09-18 12:42:05 Re: unaccent
Previous Message Oliver Elphick 2002-09-18 12:30:49 Re: strip a character from text