Re: default_text_search_config and expression indexes

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: default_text_search_config and expression indexes
Date: 2007-08-09 05:53:58
Message-ID: Pine.LNX.4.64.0708090935260.18739@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-advocacy pgsql-hackers

On Wed, 8 Aug 2007, Bruce Momjian wrote:

> Heikki Linnakangas wrote:
>>>>> Sure, but you have make sure you use the right configuration in the
>>>>> trigger, no? Does the tsquery have to use the same configuration?
>>>> I wish I knew this myself. :-) Whatever I had done happened to work
>>>> but that was largely through people on IRC walking me through it.
>>>
>>> This illustrates the major issue --- that this has to be simple for
>>> people to get started, while keeping the capabilities for experienced
>>> users.
>>>
>>> I am now thinking that making users always specify the configuration
>>> name and not allowing :: casting is going to be the best approach. We
>>> can always add more in 8.4 after it is in wide use.
>>
>> I just read the docs and I'm trying to get a grip of the problem here.
>>
>> If I understood correctly, the basic issue is that a tsvector datum
>> created using configuration A is incompatible with a tsquery datum
>> created using configuration B, in the sense that you won't get
>> reasonable results if you use the tsquery to search the tsvector, or do
>> ranking or highlighting. If the configurations happen to be similar
>> enough, it can work, but not in general.
>
> Right.

not fair. There are many cases when one can intentionally use different
configurations. But I agree, this is not for beginners.

>
>> That underlying issue manifests itself in many ways, including:
>> - if you create table with a field of type tsvector, typically kept
>> up-to-date by triggers, and do a search on it using a different
>> configuration, you get incorrect results.
>
> Right.

again, you might want to use different configuration.

>
>> - using an expression index instead of a tsvector-field, and always
>> explicitly specifying the configuration, you can avoid that problem (a
>> query with a different configuration won't use the index). But an
>> expression index, without explicitly specifying the configuration, will
>> get corrupted if you change the default configuration.
>
> Right.

the same problem if you drop constrain from table (accidently) and then
gets surprised by select results.

>
>> Removing the default configuration setting altogether removes the 2nd
>> problem, but that's not good from a usability point of view. And it
>> doesn't solve the general issue, you can still do things like:
>> SELECT * FROM foo WHERE to_tsvector('confA', textcol) @@
>> to_tsquery('confB', 'query');
>
> True, but in that case you are specifically naming different
> configurations, so it is hopefully obvious you have a mismatch.
>
>> ISTM we should have a separate tsvector and tsquery data type for each
>> configuration, and throw an error if you try to mix and match them in a
>> query. to_tsquery and to_tsvector would be new kind of polymorphic
>> functions that work with the types. Or we could automatically create a
>> copy of them when you create a new configuration. We could have a
>> default configuration setting and rewrite queries that don't explicitly
>> specify a configuration to use the default.
>
> That is going to make multiple configurations quite complex in the
> backend, and I think for little value.
>
>> You could still get into trouble if you alter the configuration after
>> starting to use it. We could solve that by not allowing you to ALTER
>> CONFIGURATION, at least not if it's used in tables or indexes. Forcing
>> people to create a new configuration, and to recreate all indexes and
>> tsvector columns every time you add a word to a stop-list, for example,
>> seems too onerous, though. Not sure what to do about that.
>
> Yea, seems more work than is necessary. If we require the configuration
> to be always supplied, and document that mismatches are a problem, I
> think we are in good shape.

We should agree that all you describe is only for DUMMY users.
>From authors point of view I dislike your approach to treat text searching
as a very limited tool. But I understand that we should preserve people from
stupid errors.

I want for beginners easy setup and error-prone functionality,
but leaving experienced users to develop complex search engines.
Can we have separate safe interface for text searching and explicitly
recommend it for beginners ?

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru)
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

In response to

Responses

Browse pgsql-advocacy by date

  From Date Subject
Next Message Bruce Momjian 2007-08-09 06:36:41 Re: default_text_search_config and expression indexes
Previous Message Josh Berkus 2007-08-09 05:12:12 Re: EnterpriseDB Postgres

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavan Deolasee 2007-08-09 06:13:18 Re: HOT and INSERT/DELETE
Previous Message Jaime Casanova 2007-08-09 04:51:39 Re: Function structure in formatting.c