Re: Full text: Ispell dictionary

From: Tim van der Linden <tim(at)shisaa(dot)jp>
To: obartunov(at)gmail(dot)com
Cc: Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Full text: Ispell dictionary
Date: 2014-05-10 01:31:02
Message-ID: 20140510103102.305fddc23f7e59f7254ecb16@shisaa.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Oleg

> btw, take a look on contrib/dict_xsyn, it's more powerful than
> synonym dictionary.

Sorry for the late reply...and thank you for the tip.

I will check out xsyn soon. I am about to finish the third and final chapter of my full text series, but I could maybe write an "appendix" chapter which mentions xsyn...or just update my posts.

Cheers,
Tim

> On Sat, May 3, 2014 at 2:26 AM, Tim van der Linden <tim(at)shisaa(dot)jp> wrote:
> > Hi Oleg
> >
> > Haha, understood!
> >
> > Thanks for helping me on this one.
> >
> > Cheers
> > Tim
> >
> >
> > On May 3, 2014 7:24:08 AM GMT+09:00, Oleg Bartunov <obartunov(at)gmail(dot)com>
> > wrote:
> >>
> >> Tim,
> >>
> >> you did answer yourself - don't use ispell :)
> >>
> >> On Sat, May 3, 2014 at 1:45 AM, Tim van der Linden <tim(at)shisaa(dot)jp> wrote:
> >>>
> >>> On Fri, 2 May 2014 21:12:56 +0400
> >>> Oleg Bartunov <obartunov(at)gmail(dot)com> wrote:
> >>>
> >>> Hi Oleg
> >>>
> >>> Thanks for the response!
> >>>
> >>>> Yes, it's normal for ispell dictionary, think about morphological
> >>>> dictionary.
> >>>
> >>>
> >>> Hmm, I see, that makes sense. I thought the morphological aspect of the
> >>> Ispell only dealt with splitting up compound words, but it also deals with
> >>> deriving the word to a more "stem" like form, correct?
> >>>
> >>> As a last question on this, is there a way to disable this dictionary to
> >>> emit multiple lexemes?
> >>>
> >>>
> >>> The reason I am asking is because in my (fairly new) understanding of
> >>> PostgreSQL's full text it is always best to have as few lexemes as possible
> >>> saved in the vector. This to get smaller indexes and faster matching
> >>> afterwards. Also, if you run a tsquery afterwards to, you can still employ
> >>> the power of these multiple lexemes to find a match.
> >>>
> >>> Or...probably answering my own question...if I do not desire this
> >>> behavior I should maybe not use Ispell and simply use another dictionary :)
> >>>
> >>> Thanks again.
> >>>
> >>> Cheers,
> >>> Tim
> >>>
> >>>> On Fri, May 2, 2014 at 11:54 AM, Tim van der Linden <tim(at)shisaa(dot)jp>
> >>>> wrote:
> >>>>>
> >>>>> Good morning/afternoon all
> >>>>>
> >>>>> I am currently writing a few articles about PostgreSQL's full text
> >>>>> capabilities and have a question about the Ispell dictionary which I
> >>>>> cannot seem to find an answer to. It is probably a very simple issue, so
> >>>>> forgive my ignorance.
> >>>>>
> >>>>> In one article I am explaining about dictionaries and I have setup a
> >>>>> sample configuration which maps most token categories to only use a Ispell
> >>>>> dictionary (timusan_ispell) which has a default configuration:
> >>>>>
> >>>>> CREATE TEXT SEARCH DICTIONARY timusan_ispell (
> >>>>> TEMPLATE = ispell,
> >>>>> DictFile = en_us,
> >>>>> AffFile = en_us,
> >>>>> StopWords = english
> >>>>> );
> >>>>>
> >>>>> When I run a simple query like "SELECT
> >>>>> to_tsvector('timusan-ispell','smiling')" I get back the following tsvector:
> >>>>>
> >>>>> 'smile':1 'smiling':1
> >>>>>
> >>>>> As you can see I get two lexemes with the same pointer.
> >>>>> The question here is: why does this happen?
> >>>>>
> >>>>> Is it normal behavior for the Ispell dictionary to emit multiple
> >>>>> lexemes for a single token? And if so, is this efficient? I
> >>>>> mean, why could it not simply save one lexeme 'smile' which (same as
> >>>>> the snowball dictionary) would match 'smiling' as well if later matched with
> >>>>> the accompanying tsquery?
> >>>>>
> >>>>> Thanks!
> >>>>>
> >>>>> Cheers,
> >>>>> Tim
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> >>>>> To make changes to your subscription:
> >>>>> http://www.postgresql.org/mailpref/pgsql-general
> >>>
> >>>
> >>>
> >>> --
> >>> Tim van der Linden <tim(at)shisaa(dot)jp>

--
Tim van der Linden <tim(at)shisaa(dot)jp>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2014-05-10 02:58:48 Re: Receiving many more rows than expected
Previous Message Adrian Klaver 2014-05-09 23:53:49 Re: Receiving many more rows than expected