Quick Links

Full text: Ispell dictionary

From:	Tim van der Linden <tim(at)shisaa(dot)jp>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Full text: Ispell dictionary
Date:	2014-05-02 07:54:57
Message-ID:	20140502165457.afe301747b439f475b1d00a5@shisaa.jp
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Good morning/afternoon all

I am currently writing a few articles about PostgreSQL's full text capabilities and have a question about the Ispell dictionary which I cannot seem to find an answer to. It is probably a very simple issue, so forgive my ignorance.

In one article I am explaining about dictionaries and I have setup a sample configuration which maps most token categories to only use a Ispell dictionary (timusan_ispell) which has a default configuration:

CREATE TEXT SEARCH DICTIONARY timusan_ispell (
TEMPLATE = ispell,
DictFile = en_us,
AffFile = en_us,
StopWords = english
);

When I run a simple query like "SELECT to_tsvector('timusan-ispell','smiling')" I get back the following tsvector:

'smile':1 'smiling':1

As you can see I get two lexemes with the same pointer.
The question here is: why does this happen?

Is it normal behavior for the Ispell dictionary to emit multiple lexemes for a single token? And if so, is this efficient? I mean, why could it not simply save one lexeme 'smile' which (same as the snowball dictionary) would match 'smiling' as well if later matched with the accompanying tsquery?

Thanks!

Cheers,
Tim

Responses

Re: Full text: Ispell dictionary at 2014-05-02 17:12:56 from Oleg Bartunov

Browse pgsql-general by date

	From	Date	Subject
Next Message	Francisco Olarte	2014-05-02 08:14:06	Re: break table into portions for writing to separate files
Previous Message	David G Johnston	2014-05-02 01:20:05	Re: Revoke - database does not exist