From: | Gregory Stark <stark(at)enterprisedb(dot)com> |
---|---|
To: | "Mike Rylander" <mrylander(at)gmail(dot)com> |
Cc: | "Bruce Momjian" <bruce(at)momjian(dot)us>, "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>, "Ron Mayer" <rm_pg(at)cheapcomplexdevices(dot)com>, <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: default_text_search_config and expression indexes |
Date: | 2007-08-14 22:17:19 |
Message-ID: | 87fy2loj68.fsf@oxford.xeocode.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-advocacy pgsql-hackers |
"Mike Rylander" <mrylander(at)gmail(dot)com> writes:
> My application (http://open-ils.org, which run >80% of the public
> libraries in Georgia, USA, http://gapines.org and
> http://georgialibraries.org/lib/pines.html) requires that I be able to
> search a corpus of bibliographic records in a mix of languages, and
> potentially with mixed stop-word rules, with one query. I cannot know
> ahead of time what languages will be used in the corpus and I cannot
> restrict any one query to one language. To accomplish this, the
> record itself will be inspected inside an INSERT/UPDATE trigger to
> determine the language and type, and use the correct configuration for
> creating the tsvector. This will obviously result in a "mixed"
> tsvector column, but that's exactly what I need. I can filter on
> record language if the user happens to specify a query language (and
> thus configuration), or simply rank the assumed (IP based, perhaps, or
> browser preference based) preferred language higher, or one of a
> hundred other things. But I won't be able to do any of that if
> tsvectors are required to have one and only one configuration per
> column.
>
> Anyway, I felt I needed to provide some outside perspective to this,
> as a user, since it seems that the external viewpoint (my particular
> viewpoint, at least) was missing from the discussion.
This is *extremely* useful. I think it's precisely what we've been missing so
far. At least, what I've been missing.
So the question is what exactly happens in this case? If I search for "the"
does that mean it will ignore matches in English where that's a stop-word but
find me books on tea in French? Is that what I should expect to happen? What
if I search for "earl and the"? Does that find me French books on Early Grey
Tea but English books on all earls?
What happens if I use the same operator directly on the text column? Or
perhaps it's not even possible to specify stop-words when operating on a text
column? Should it be?
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Decibel! | 2007-08-14 22:25:52 | Re: 12 Silver Bullets |
Previous Message | Tom Lane | 2007-08-14 21:58:21 | Re: default_text_search_config and expression indexes |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2007-08-14 22:49:53 | Re: CVS corruption/mistagging? |
Previous Message | Tom Lane | 2007-08-14 22:15:09 | Re: CVS corruption/mistagging? |