From: | Bibi Mansione <golgote(at)gmail(dot)com> |
---|---|
To: | pgsql-general(at)lists(dot)postgresql(dot)org |
Subject: | Hunspell as filtering dictionary |
Date: | 2019-11-05 14:42:17 |
Message-ID: | CACZ67_U8Vu66-kPRj_v2icmn_wmz9_LDM8Tv_tvptKKwBXD2tQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hi,
I am trying to create a ts_vector from a French text. Here are the
operations that seem logical to perform in that order:
1. remove stopwords
2. use hunspell to find words roots
3. unaccent
I first tried:
CREATE TEXT SEARCH CONFIGURATION fr_conf (copy='simple');
ALTER TEXT SEARCH CONFIGURATION fr_conf
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
word, hword, hword_part
WITH unaccent, french_hunspell;
select * from to_tsvector('fr_conf', E'Pour découvrir et rencontrer
l\'aventure.');
-- 'aventure':5 'aventurer':5 'rencontrer':3
But the verb "découvrir" is missing :(
If I try with french_hunspell only, I get it, but with the accent:
select * from to_tsvector('french_hunspell', E'Pour découvrir et rencontrer
l\'aventure.');
-- 'aventure':6 'aventurer':6 'découvrir':2 'rencontrer':4
I also tried:
CREATE TEXT SEARCH CONFIGURATION fr_conf2 (copy='simple');
ALTER TEXT SEARCH CONFIGURATION fr_conf2
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
word, hword, hword_part
WITH french_hunspell, unaccent;
select * from to_tsvector('fr_conf2', E'Pour découvrir et rencontrer
l\'aventure.');
-- 'aventure':5 'aventurer':5 'rencontrer':3
But I guess unaccent is never called.
I believe this is because french_hunspell is not a filtering dictionary,
but I might be wrong. So is there a way to get this result from any FTS
configuration (existing or :
-- 'aventure':6 'aventurer':6 'decouvrir':2 'rencontrer':4
Thanks,
Bertrand
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Shapiro | 2019-11-05 14:43:51 | Re: select view definition from pg_views feature request |
Previous Message | Stephen Frost | 2019-11-05 14:38:12 | Re: v12 and pg_restore -f- |