From: | Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> |
---|---|
To: | Bruce Momjian <bruce(at)momjian(dot)us> |
Cc: | Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: default_text_search_config and expression indexes |
Date: | 2007-08-09 05:53:58 |
Message-ID: | Pine.LNX.4.64.0708090935260.18739@sn.sai.msu.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-advocacy pgsql-hackers |
On Wed, 8 Aug 2007, Bruce Momjian wrote:
> Heikki Linnakangas wrote:
>>>>> Sure, but you have make sure you use the right configuration in the
>>>>> trigger, no? Does the tsquery have to use the same configuration?
>>>> I wish I knew this myself. :-) Whatever I had done happened to work
>>>> but that was largely through people on IRC walking me through it.
>>>
>>> This illustrates the major issue --- that this has to be simple for
>>> people to get started, while keeping the capabilities for experienced
>>> users.
>>>
>>> I am now thinking that making users always specify the configuration
>>> name and not allowing :: casting is going to be the best approach. We
>>> can always add more in 8.4 after it is in wide use.
>>
>> I just read the docs and I'm trying to get a grip of the problem here.
>>
>> If I understood correctly, the basic issue is that a tsvector datum
>> created using configuration A is incompatible with a tsquery datum
>> created using configuration B, in the sense that you won't get
>> reasonable results if you use the tsquery to search the tsvector, or do
>> ranking or highlighting. If the configurations happen to be similar
>> enough, it can work, but not in general.
>
> Right.
not fair. There are many cases when one can intentionally use different
configurations. But I agree, this is not for beginners.
>
>> That underlying issue manifests itself in many ways, including:
>> - if you create table with a field of type tsvector, typically kept
>> up-to-date by triggers, and do a search on it using a different
>> configuration, you get incorrect results.
>
> Right.
again, you might want to use different configuration.
>
>> - using an expression index instead of a tsvector-field, and always
>> explicitly specifying the configuration, you can avoid that problem (a
>> query with a different configuration won't use the index). But an
>> expression index, without explicitly specifying the configuration, will
>> get corrupted if you change the default configuration.
>
> Right.
the same problem if you drop constrain from table (accidently) and then
gets surprised by select results.
>
>> Removing the default configuration setting altogether removes the 2nd
>> problem, but that's not good from a usability point of view. And it
>> doesn't solve the general issue, you can still do things like:
>> SELECT * FROM foo WHERE to_tsvector('confA', textcol) @@
>> to_tsquery('confB', 'query');
>
> True, but in that case you are specifically naming different
> configurations, so it is hopefully obvious you have a mismatch.
>
>> ISTM we should have a separate tsvector and tsquery data type for each
>> configuration, and throw an error if you try to mix and match them in a
>> query. to_tsquery and to_tsvector would be new kind of polymorphic
>> functions that work with the types. Or we could automatically create a
>> copy of them when you create a new configuration. We could have a
>> default configuration setting and rewrite queries that don't explicitly
>> specify a configuration to use the default.
>
> That is going to make multiple configurations quite complex in the
> backend, and I think for little value.
>
>> You could still get into trouble if you alter the configuration after
>> starting to use it. We could solve that by not allowing you to ALTER
>> CONFIGURATION, at least not if it's used in tables or indexes. Forcing
>> people to create a new configuration, and to recreate all indexes and
>> tsvector columns every time you add a word to a stop-list, for example,
>> seems too onerous, though. Not sure what to do about that.
>
> Yea, seems more work than is necessary. If we require the configuration
> to be always supplied, and document that mismatches are a problem, I
> think we are in good shape.
We should agree that all you describe is only for DUMMY users.
>From authors point of view I dislike your approach to treat text searching
as a very limited tool. But I understand that we should preserve people from
stupid errors.
I want for beginners easy setup and error-prone functionality,
but leaving experienced users to develop complex search engines.
Can we have separate safe interface for text searching and explicitly
recommend it for beginners ?
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru)
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2007-08-09 06:36:41 | Re: default_text_search_config and expression indexes |
Previous Message | Josh Berkus | 2007-08-09 05:12:12 | Re: EnterpriseDB Postgres |
From | Date | Subject | |
---|---|---|---|
Next Message | Pavan Deolasee | 2007-08-09 06:13:18 | Re: HOT and INSERT/DELETE |
Previous Message | Jaime Casanova | 2007-08-09 04:51:39 | Re: Function structure in formatting.c |