Re: making tsearch2 dictionaries

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Ben <bench(at)silentmedia(dot)com>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, pgsql-general(at)postgresql(dot)org
Subject: Re: making tsearch2 dictionaries
Date: 2004-02-17 11:15:57
Message-ID: Pine.GSO.4.58.0402171337160.3452@ra.sai.msu.su
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, 16 Feb 2004, Ben wrote:

> So I noticed. ;) The dictionary's working, and I'd be happy to expand
> upon the documentation. Just point me at something to work on.
>

I think you may just write a paper "How I did custom dictionary for tsearch2".
>From what I've read I see your dictionary could be interesting to people
especially if you describe the motivation and usage.
Do you want '100' or 'hundred' will be fully equivalent ? So,
if you search '100' you will find document with 'hundred'. Interesting,
that you will find '123', because '123' will be 'one hundred twenty three'.

> But, like I said, I really want to figure out a way to pipe the output
> of my dictionary through the another dictionary. If I can't do that, it
> doesn't seem as useful, because "100" (handled by my dictionary) and
> "one hundred" (handled by en_stem) currently don't generate the same
> ts_vector.

What's the problem ? You may configure which dictionaries and in what order
should be used for given type of token (pg_ts_cfgmap table).
Aha, I got your problem:

www=# select * from ts_debug('one hundred');
ts_name | tok_type | description | token | dict_name | tsvector
-----------------+----------+-------------+---------+-----------+----------
default_russian | lword | Latin word | one | {en_stem} | 'one'
default_russian | lword | Latin word | hundred | {en_stem} | 'hundr

'hundred' becames 'hundr'. You may use synonym dictionary which is
rather simple
( see http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_Notes for details ).
Once word is recognized by synonym dictionary it will not pass to
next dictionary ! This is how tsearch2 is working with any dictionary.

>
> Once I figure out how to tweak the parser to parse things they way I
> want, I can expand upon those docs too. Looks like I'm going to need to
> reach waaaay back into my brain and dust off my flex knowledge for that,
> though....

What do you want from parser ?

>
> On Mon, 2004-02-16 at 10:33, Oleg Bartunov wrote:
> > btw, Ben, if you get you dictionary working, could you describe process
> > of developing so other people will appreciate your work. This part of
> > tsearch2 documentation is very weak.
> >
> > Oleg
> >
> > On Mon, 16 Feb 2004, Teodor Sigaev wrote:
> >
> > >
> > >
> > > Ben wrote:
> > > > Thanks for the replies. Just to clarify what I was doing, quaicode
> > > > looked something like:
> > > >
> > > > phrase = palloc(8);
> > > > phrase = "foo\0bar\0";
> > > > res = palloc(3);
> > > > res[0] = phrase[0];
> > > > res[1] = phrase[5];
> > > > res[2] = 0;
> > > >
> > > > That crashed. Once I changed it to:
> > > >
> > > > res = palloc(3);
> > > > res[0] = palloc(4);
> > > > res[0] = "foo\0";
> > > > res[1] = palloc(4);
> > > > res[2] = "bar\0";
> > > > res[3] = 0;
> > > >
> > > > it worked.
> > > >
> > > :)
> > > I hope you mean:
> > > res = palloc(3);
> > > res[0] = palloc(4);
> > > memcpy(res[0] ,"foo", 4);
> > > res[1] = palloc(4);
> > > memcpy(res[1] ,"bar", 4);
> > > res[2] = 0;
> > >
> > > Look at indexes of res.
> > >
> > >
> >
> > Regards,
> > Oleg
> > _____________________________________________________________
> > Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
> > Sternberg Astronomical Institute, Moscow University (Russia)
> > Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
> > phone: +007(095)939-16-83, +007(095)939-23-83
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Pascal Polleunus 2004-02-17 11:39:06 Re: function returning a record
Previous Message Matthew Lunnon 2004-02-17 11:14:30 summary aggregate information from a second table