unaccent

From: nngodinh(at)tiscali(dot)it
To: pgsql-hackers(at)postgresql(dot)org
Subject: unaccent
Date: 2002-09-18 10:14:49
Message-ID: 3D6DC6360001FB32@mail-1.tiscalinet.it
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

As far as I use the txtidx data structure in conjunction with gist indexing
to make a word indexing of a very large UNICODE db, I've implemented a PostgreSQL
function that uses libunac to unaccent TEXT fileds.

The resulting text is in UTF-8, but you can modify it in the sources with
an appropriate value (using iconv charset names).

Get libunac from: http://www.nongnu.org/unac/ (it uses iconv)

Extract the archive, compile it (make). Move pg_unac.so to your postgresql
shared libraries dir.

Link it in postgresql:

CREATE FUNCTION unac(TEXT) RETURNS TEXT AS 'path_to_pg_unac.so' LANGUAGE
C;

What about integrating unaccent libraries directly in tsearch? It is useful
for french search engines (for instance).

Bye.

Nhan NGO DINH

__________________________________________________________________
Tiscali Ricaricasa
la prima prepagata per navigare in Internet a meno di un'urbana e
risparmiare su tutte le tue telefonate. Acquistala on line e non avrai
nessun costo di attivazione né di ricarica!
http://ricaricasaonline.tiscali.it/

Attachment Content-Type Size
pg_unac-1.0.tar.gz application/x-gzip-compressed 728 bytes

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message nngodinh 2002-09-18 10:18:48 strip a character from text
Previous Message Oleg Bartunov 2002-09-18 10:07:26 please apply patch to contrib/ltree