TSearch queries with multiple languages

From: Gordon Callan <gordon_callan(at)hotmail(dot)com>
To: <pgsql-general(at)postgresql(dot)org>
Subject: TSearch queries with multiple languages
Date: 2009-02-12 23:38:40
Message-ID: BLU143-W45DD0FE50B08E8D41F362682BB0@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


Greetings,

I'm implementing full text search at our company, using Tsearch2 and have read Chapter 12 (FTS) numerous times and am still unclear about something.

All our data is stored in Postgresql in Unicode.
The data to be searched can be in a number of different languages.

I plan to create a ts_vector column for each corresponding data column to be searched and use an additional column for the
regconfig where each row will have a configuration name depending on the language of that row of data.

So, for example, we have a table called node with columns node_id, body;

We add 2 columns, ts_body and ts_config to contain the ts_vector data of body and the configuration (langugage) in the
other.

ALTER TABLE node ADD column ts_body tsvector, ts_config regconfig;

At install time, the ts_config column will be populated so that it contains the language/config for each row. We will also
provide a means to keep the ts_body column updated each time the underlying body data changes.

We then generate the ts_vector column using this configuration:
UPDATE node SET ts_body = to_tsvector(ts_config, body);

Presumably, this will generate tsvector data for every row, using what's in the regconfig column to determine the language.

Next we create an index on the ts_vector column:
CREATE INDEX node_ts_body on node USING gin(ts_body);

From the documentation, it seems this index will know what config each row has.

OK, now here's where the documentation is sketchy.

When searching, we will generate SQL like this:

SELECT *
FROM node
WHERE (ts_body @@ to_tsquery('english','foo & bar'));

Assuming we have 3 different configurations (all contained in various rows and defined in the regconfig column), what language(s) will be returned in the result set? All 3 languages? Is it based on the default_text_search_config ?

Thanks for your help,
Gordon

_________________________________________________________________
Windows Live™: Keep your life in sync.
http://windowslive.com/howitworks?ocid=TXT_TAGLM_WL_t1_allup_howitworks_022009

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2009-02-13 00:07:02 Re: TSearch queries with multiple languages
Previous Message John R Pierce 2009-02-12 23:26:08 Re: Remote Connection