From: | Michael Fuhr <mike(at)fuhr(dot)org> |
---|---|
To: | Hannes Dorbath <light(at)theendofthetunnel(dot)de> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: TSearch2: Auto identify document language? |
Date: | 2005-12-11 17:04:38 |
Message-ID: | 20051211170438.GA77947@winnie.fuhr.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Sun, Dec 11, 2005 at 01:17:42PM +0100, Hannes Dorbath wrote:
> Is there a practical way to make a guess what language a document is
> written in and auto magically use the adequate TSearch config? I thought
> of looking up the document's words in various dicts and use the one with
> the most matches.. doesn't matter if performance will be bad.
I don't know how easily you could incorporate this into tsearch2,
but for the general problem of language identification you could
try something like Perl's Lingua::Identify module.
http://search.cpan.org/dist/Lingua-Identify/lib/Lingua/Identify.pm
CREATE FUNCTION langof(text) RETURNS text AS $$
use Lingua::Identify qw(:language_identification);
return langof($_[0]);
$$ LANGUAGE plperlu IMMUTABLE STRICT;
SELECT langof('The quick brown fox jumped over the lazy dog.');
langof
--------
en
(1 row)
SELECT langof('Der schnelle braune Fuchs sprang über den faulen Hund.');
langof
--------
de
(1 row)
SELECT langof('El zorro marrón rápido saltó sobre el perro perezoso.');
langof
--------
es
(1 row)
SELECT langof('La volpe marrone rapida ha saltato sopra il cane pigro.');
langof
--------
it
(1 row)
SELECT langof('Le renard brun rapide a sauté par-dessus le chien paresseux.');
langof
--------
fi
(1 row)
Language identification isn't always accurate -- in this example
the function thinks the last text is Finnish instead of French --
but it might get better with more text to examine, and you can tell
Lingua::Identify which languages to consider or ignore.
--
Michael Fuhr
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Fuhr | 2005-12-11 17:22:18 | Re: [SQL] Looking for information on PostgreSQL Stored Procedures |
Previous Message | Tom Lane | 2005-12-11 17:04:18 | Re: [SQL] Looking for information on PostgreSQL Stored Procedures |