From: | Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> |
---|---|
To: | Alban Hertroys <a(dot)hertroys(at)magproductions(dot)nl> |
Cc: | Postgres General <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Tsearch2 Dutch snowball stemmer in PG8.1 |
Date: | 2007-10-03 12:32:55 |
Message-ID: | Pine.LNX.4.64.0710031630410.3304@sn.sai.msu.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Alban,
the documentation you're refereed on is for upcoming 8.3 release.
For 8.1 and 8.2 you need to do all machinery by hand. It's not
difficult, for example:
-- sample tsearch2 configuration for search.postgresql.org
-- Creates configuration 'pg' - default, should match server's locale !!!
-- Change 'ru_RU.UTF-8'
begin;
-- create special (default) configuration 'pg'
update pg_ts_cfg set locale=NULL where locale = 'ru_RU.UTF-8';
insert into pg_ts_cfg values('pg','default','ru_RU.UTF8');
-- register 'pg_dict' dictionary using synonym template
-- postgres pg
-- pgsql pg
-- postgresql pg
insert into pg_ts_dict
(select 'pg_dict',dict_init,
'/usr/local/pgsql-dev/share/contrib/pg_dict.txt',
dict_lexize, 'pg-specific dictionary'
from pg_ts_dict
where dict_name='synonym'
);
-- register ispell dictionary, check paths and stop words
-- I used iconv for english files, since there are some cyrillic stuff
insert into pg_ts_dict
(SELECT 'en_ispell', dict_init,
'DictFile="/usr/local/share/dicts/ispell/utf8/english-utf8.dict",'
'AffFile="/usr/local/share/dicts/ispell/utf8/english-utf8.aff",'
'StopFile="/usr/local/share/dicts/ispell/utf8/english-utf8.stop"',
dict_lexize
FROM pg_ts_dict
WHERE dict_name = 'ispell_template'
);
-- use the same stop-word list as 'en_ispell' dictionary
UPDATE pg_ts_dict set dict_initoption='/usr/local/share/dicts/english.stop'
where dict_name='en_stem';
-- default token<->dicts mappings
insert into pg_ts_cfgmap select 'pg', tok_alias, dict_name from public.pg_ts_cfgmap where ts_name='default';
-- modify mappings for latin words for configuration 'pg'
update pg_ts_cfgmap set dict_name = '{pg_dict,en_ispell,en_stem}'
where tok_alias in ( 'lword', 'lhword', 'lpart_hword' )
and ts_name = 'pg';
-- we won't index/search some tokens
update pg_ts_cfgmap set dict_name = NULL
--where tok_alias in ('email', 'url', 'sfloat', 'uri', 'float','word')
where tok_alias in ('email', 'url', 'sfloat', 'uri', 'float')
and ts_name = 'pg';
end;
-- testing
select * from ts_debug('
PostgreSQL, the highly scalable, SQL compliant, open source object-relational
database management system, is now undergoing beta testing of the next
version of our software: PostgreSQL 8.2.
');
Oleg
On Wed, 3 Oct 2007, Alban Hertroys wrote:
> Hello,
>
> I'm trying to get a Dutch snowball stemmer in Postgres 8.1, but I can't
> find how to do that.
>
> I found CREATE FULLTEXT DICTIONARY commands in the tsearch2 docs on
> http://www.sai.msu.su/~megera/postgres/fts/doc/index.html, but these
> commands are apparently not available on PG8.1.
>
> I also found the tables pg_ts_(cfg|cfgmap|dict|parser), but I have no
> idea how to add a Dutch stemmer to those.
>
> I did find some references to stem.[ch] files that were suggested to
> compile into the postgres sources, but I cannot believe that's the right
> way to do this (besides that I don't have sufficient privileges to
> install such a version).
>
> So... How do I do this?
>
> The system involved is some version of Debian Linux (2.6 kernel); are
> there any packages for a Dutch stemmer maybe?
>
> I'm in a bit of a hurry too, as we're on a tight deadline :(
>
> Regards,
>
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru)
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
From | Date | Subject | |
---|---|---|---|
Next Message | Geoffrey | 2007-10-03 12:47:14 | Re: reporting tools |
Previous Message | Alvaro Herrera | 2007-10-03 12:12:36 | Re: pg_cancel_backend() does not work with buzz queries |