From: | Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Jan Urbański <j(dot)urbanski(at)students(dot)mimuw(dot)edu(dot)pl>, pgsql-patches(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org, Teodor Sigaev <teodor(at)sigaev(dot)ru> |
Subject: | Re: a tsearch2 (8.2.4) dictionary that only filters out stopwords |
Date: | 2007-11-14 17:06:09 |
Message-ID: | Pine.LNX.4.64.0711141950040.7787@sn.sai.msu.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-patches |
In principle the right way is to allow any dictionary have option
like 'PassThrough' and internal function get_dict_options(dict, option)
to check if PassThrough option is true.
Let's consider one example - removing accents.
In the past I always recommend people to use regex functions before
to_tsvector conversion to remove accents, but recently I was noticed that
such trick doesn't work with headline(). So, the only way is to have
special dictionary dict_remove_accent before, which works as a filter.
I don't remember why do we left this for future releases, though.
Oleg
On Wed, 14 Nov 2007, Tom Lane wrote:
> This patch:
> http://archives.postgresql.org/pgsql-patches/2007-11/msg00137.php
> seems simple and useful enough that I think we ought to slip it into
> 8.3, even though we are far past feature freeze.
>
> As the "simple" dictionary type stands in CVS HEAD, it is only useful as
> the last dictionary in a stack, since it never passes anything on as
> unrecognized. With the proposed AcceptAll = false option, it could be
> used to filter out some stopwords before feeding tokens to another
> dictionary. While most dictionary types have their own stopword support,
> some of them match stopwords after their own normalization processing,
> and so there's no way to filter on pre-normalized words. That seems
> like a good improvement, even without the specific need-example that
> Jan provided at the start of the thread.
>
> Normally we'd never consider adding a new feature so late in the
> development cycle, but this seems small enough and useful enough
> to make an exception. Comments?
>
> regards, tom lane
>
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru)
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2007-11-14 17:17:25 | Re: a tsearch2 (8.2.4) dictionary that only filters out stopwords |
Previous Message | Tom Lane | 2007-11-14 16:50:55 | Re: Fix pg_dump dependency on postgres.h |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2007-11-14 17:17:25 | Re: a tsearch2 (8.2.4) dictionary that only filters out stopwords |
Previous Message | Tom Lane | 2007-11-14 16:50:55 | Re: Fix pg_dump dependency on postgres.h |