processing urls with tsearch2

From: "Laimonas Simutis" <laimis(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: processing urls with tsearch2
Date: 2007-09-13 18:35:38
Message-ID: 2b3e22740709131135o63e2d281k2efe27ebaf20a715@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hey guys,

maybe anyone using tsearch2 could advise on this. With the default
installation, url, host and some other tokens are processed with the simple
dictionary. Thus term like mywebsite.com gets stored as 'mywebsite.com'. The
parser correctly assigns token id of type host to the term, but then the
dictionary the terms gets routed through is simple and what gets stored is
mywebsite.com

The questions are:

1) is there a dictionary available that I could utilize that will remove
.com, .net, .org, etc? I could write one myself, but after seeing some
sample dictionary implementations and C code I try to avoid, I got scared a
bit.

2) has anyone else dealt with this maybe in a different way?

Thanks for any suggestions and help,

Laimis

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Jeff Davis 2007-09-13 18:38:41 pg_standby observation
Previous Message Marco Colombo 2007-09-13 17:14:39 Re: Cannot declare record members NOT NULL