| From: | Tim van der Linden <tim(at)shisaa(dot)jp> |
|---|---|
| To: | pgsql-general(at)postgresql(dot)org |
| Subject: | Full text: Ispell dictionary |
| Date: | 2014-05-02 07:54:57 |
| Message-ID: | 20140502165457.afe301747b439f475b1d00a5@shisaa.jp |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-general |
Good morning/afternoon all
I am currently writing a few articles about PostgreSQL's full text capabilities and have a question about the Ispell dictionary which I cannot seem to find an answer to. It is probably a very simple issue, so forgive my ignorance.
In one article I am explaining about dictionaries and I have setup a sample configuration which maps most token categories to only use a Ispell dictionary (timusan_ispell) which has a default configuration:
CREATE TEXT SEARCH DICTIONARY timusan_ispell (
TEMPLATE = ispell,
DictFile = en_us,
AffFile = en_us,
StopWords = english
);
When I run a simple query like "SELECT to_tsvector('timusan-ispell','smiling')" I get back the following tsvector:
'smile':1 'smiling':1
As you can see I get two lexemes with the same pointer.
The question here is: why does this happen?
Is it normal behavior for the Ispell dictionary to emit multiple lexemes for a single token? And if so, is this efficient? I mean, why could it not simply save one lexeme 'smile' which (same as the snowball dictionary) would match 'smiling' as well if later matched with the accompanying tsquery?
Thanks!
Cheers,
Tim
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Francisco Olarte | 2014-05-02 08:14:06 | Re: break table into portions for writing to separate files |
| Previous Message | David G Johnston | 2014-05-02 01:20:05 | Re: Revoke - database does not exist |