From: | Ben <bench(at)silentmedia(dot)com> |
---|---|
To: | Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> |
Cc: | Teodor Sigaev <teodor(at)sigaev(dot)ru>, <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: making tsearch2 dictionaries |
Date: | 2004-02-17 18:08:47 |
Message-ID: | Pine.LNX.4.44.0402171000290.32605-100000@localhost.localdomain |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Tue, 17 Feb 2004, Oleg Bartunov wrote:
> it's unpredictable and I still don't get your idea of pipilining, but
> in general, I have nothing agains it.
Oh, well, the idea is that instead of the dictionary searching stopping at
the first dictionary in the chain that returns a lexeme, it would take
each of the lexemes returned and pass them on to the next dictionary in
the chain.
So if I specified numbers were to be handled by my num2english dictionary,
followed by en_stem, and then tried to deal get a vector for "100",
num2english would return "one" and "hundred". Then both "one" and
"hundred" would each be looked up in en_stem, and the union of these
lexems would be the final result.
Similarly, if a latin word gets piped through an ispell dictionary before
being sent to en_stem, each possible spelling would be stemmed.
> Aha, the same way as we handle complex words with hyphen - we return
> the whole word and its parts. So you need to introduce new type of token
> in parser and use synonym dictionary which in one's turn will returns
> the symbol token and human readable word.
Okay, that makes sense. I'll look more into how hyphenated words are being
handled now.
From | Date | Subject | |
---|---|---|---|
Next Message | Mike Nolan | 2004-02-17 18:46:55 | CRM Academic Research Request |
Previous Message | Oleg Bartunov | 2004-02-17 17:55:56 | Re: making tsearch2 dictionaries |