From: | nngodinh(at)tiscali(dot)it |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | unaccent |
Date: | 2002-09-18 10:14:49 |
Message-ID: | 3D6DC6360001FB32@mail-1.tiscalinet.it |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Greetings,
As far as I use the txtidx data structure in conjunction with gist indexing
to make a word indexing of a very large UNICODE db, I've implemented a PostgreSQL
function that uses libunac to unaccent TEXT fileds.
The resulting text is in UTF-8, but you can modify it in the sources with
an appropriate value (using iconv charset names).
Get libunac from: http://www.nongnu.org/unac/ (it uses iconv)
Extract the archive, compile it (make). Move pg_unac.so to your postgresql
shared libraries dir.
Link it in postgresql:
CREATE FUNCTION unac(TEXT) RETURNS TEXT AS 'path_to_pg_unac.so' LANGUAGE
C;
What about integrating unaccent libraries directly in tsearch? It is useful
for french search engines (for instance).
Bye.
Nhan NGO DINH
__________________________________________________________________
Tiscali Ricaricasa
la prima prepagata per navigare in Internet a meno di un'urbana e
risparmiare su tutte le tue telefonate. Acquistala on line e non avrai
nessun costo di attivazione né di ricarica!
http://ricaricasaonline.tiscali.it/
Attachment | Content-Type | Size |
---|---|---|
pg_unac-1.0.tar.gz | application/x-gzip-compressed | 728 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | nngodinh | 2002-09-18 10:18:48 | strip a character from text |
Previous Message | Oleg Bartunov | 2002-09-18 10:07:26 | please apply patch to contrib/ltree |