Re: processing urls with tsearch2

From: "Laimonas Simutis" <laimis(at)gmail(dot)com>
To: "Oleg Bartunov" <oleg(at)sai(dot)msu(dot)su>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: processing urls with tsearch2
Date: 2007-09-13 20:41:34
Message-ID: 2b3e22740709131341r6ceed867m4cb3beef27f874db@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Any way to install the dictionary without the make? As in is there binary
versions of it available? I am running postgresql on windows servers...

On 9/13/07, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> wrote:
>
> On Thu, 13 Sep 2007, Laimonas Simutis wrote:
>
> > Hey guys,
> >
> > maybe anyone using tsearch2 could advise on this. With the default
> > installation, url, host and some other tokens are processed with the
> simple
> > dictionary. Thus term like mywebsite.com gets stored as 'mywebsite.com'.
> The
> > parser correctly assigns token id of type host to the term, but then the
> > dictionary the terms gets routed through is simple and what gets stored
> is
> > mywebsite.com
> >
> > The questions are:
> >
> > 1) is there a dictionary available that I could utilize that will remove
> > .com, .net, .org, etc? I could write one myself, but after seeing some
> > sample dictionary implementations and C code I try to avoid, I got
> scared a
> > bit.
>
> Yes, we have dict_regex, which was developed by Sergey Karpov, see details
> http://lynx.sao.ru/~karpov/software/postgres_dict_regex.html
> It uses pcre library and you need to know perl regexps.
>
> >
> > 2) has anyone else dealt with this maybe in a different way?
>
> sure, preprocess text using prefered language before passing to
> ro_tsvector
>
> >
> >
> > Thanks for any suggestions and help,
> >
> > Laimis
> >
>
> Regards,
> Oleg
> _____________________________________________________________
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru)
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
> phone: +007(495)939-16-83, +007(495)939-23-83
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Carlo Stonebanks 2007-09-13 20:46:36 8.2.4 error restoring dump because of gin__int_ops
Previous Message Erik Jones 2007-09-13 20:13:13 Re: pg_standby observation