Re: integrated tsearch doesn't work with non utf8 database

From: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Teodor Sigaev" <teodor(at)sigaev(dot)ru>, "Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com>, "Oleg Bartunov" <oleg(at)sai(dot)msu(dot)su>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: integrated tsearch doesn't work with non utf8 database
Date: 2007-09-10 14:56:31
Message-ID: 46E55B1F.3090207@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
> Teodor Sigaev <teodor(at)sigaev(dot)ru> writes:
>>> Note the Seq Scan on pg_ts_config_map, with filter on ts_lexize(mapdict,
>>> $1). That means that it will call ts_lexize on every dictionary, which
>>> will try to load every dictionary. And loading danish_stem dictionary
>>> fails in latin2 encoding, because of the problem with the stopword file.
>
>> Attached patch should fix it, I hope.
>
> Uh, how will that help? AFAICS it still has to call ts_lexize with
> every dictionary.

No, ts_lexize is no longer in the seq scan filter, but in the sort key
that's calculated only for those rows that match the filter 'mapcfg=?
AND maptokentype=?'. It is pretty kludgey, though. The planner could
choose another plan, that fails, if the statistics were different.
Rewriting the function in C would be a more robust fix.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2007-09-10 15:01:24 Re: A Silly Idea for Vertically-Oriented Databases
Previous Message Mark Mielke 2007-09-10 14:38:02 Re: A Silly Idea for Vertically-Oriented Databases