From: | Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Bruce Momjian <bruce(at)momjian(dot)us>, Michael Paesold <mpaesold(at)gmx(dot)at>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: How does the tsearch configuration get selected? |
Date: | 2007-06-15 04:00:10 |
Message-ID: | Pine.LNX.4.64.0706150745090.1881@sn.sai.msu.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-advocacy pgsql-hackers |
On Thu, 14 Jun 2007, Tom Lane wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
>> First, why are we specifying the server locale here since it never
>> changes:
server's locale is used just for one purpose - to select what text search
configuration to use by default. Any text search functions can accept
text search configuration as an optional parameter.
>
> It's poorly described. What it should really say is the language
> that the text-to-be-searched is in. We can actually support multiple
> languages here today, the restriction being that there have to be
> stemmer instances for the languages with the database encoding you're
> using. With UTF8 encoding this isn't much of a restriction. We do need
> to put code into the dictionary stuff to enforce that you can't use a
> stemmer when the database encoding isn't compatible with it.
>
> I would prefer that we not drive any of this stuff off the server's
> LC_xxx settings, since as you say that restricts things to just one
> locale.
something like
CREATE TEXT SEARCH DICTIONARY dictname [LOCALE=ru_RU.UTF-8]
and raise warning/error if database encoding doesn't match dictionary
encoding if specified (not all dictionaries depend on encoding, so it
should be an optional parameter).
>
>> Second, I can't figure out how to reference a non-default
>> configuration.
>
> See the multi-argument versions of to_tsvector etc.
>
> I do see a problem with having to_tsvector(config, text) plus
> to_tsvector(text) where the latter implicitly references a config
> selected by a GUC variable: how can you tell whether a query using the
> latter matches a particular index using the former? There isn't
> anything in the current planner mechanisms that would make that work.
Probably, having default text search configuration is not a good idea
and we could just require it as a mandatory parameter, which could
eliminate many confusion with selecting text search configuration.
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru)
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
From | Date | Subject | |
---|---|---|---|
Next Message | Liam O'Duibhir | 2007-06-15 04:23:08 | The Business Case for PostgreSQL |
Previous Message | Tom Lane | 2007-06-15 03:39:35 | Re: How does the tsearch configuration get selected? |
From | Date | Subject | |
---|---|---|---|
Next Message | Oleg Bartunov | 2007-06-15 04:46:35 | Re: tsearch_core patch: permissions and security issues |
Previous Message | Tom Lane | 2007-06-15 03:39:35 | Re: How does the tsearch configuration get selected? |