Re: PostgreSQL GIN index not used when ts_query language is fetched from a column

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Jaap Roes <jroes(at)leukeleu(dot)nl>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: PostgreSQL GIN index not used when ts_query language is fetched from a column
Date: 2016-09-15 19:29:39
Message-ID: CAMkU=1w_s+2KsK5GqPXMMEog+WjG86HtqGe=_txQXvvC_ShfjQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, Sep 15, 2016 at 5:46 AM, Jaap Roes <jroes(at)leukeleu(dot)nl> wrote:

> I've got a table that stores some multilingual content:
>
> CREATE TABLE search (
> content text NOT NULL,
> language regconfig NOT NULL,
> fulltext tsvector
> );
> CREATE INDEX search_fulltext ON search USING GIN(fulltext);
>
> INSERT INTO search (language, content) VALUES
> ('dutch', 'Als achter vliegen vliegen vliegen vliegen vliegen
> vliegen achterna'),
> ('dutch', 'Langs de koele kali liep een kale koeli met een kilo kali
> op zijn kale koeli-kop.'),
> ('dutch', 'Moeder sneed zeven scheve sneden brood'),
> ('english', 'I saw Susie sitting in a shoe shine shop. Where she
> sits she shines, and where she shines she sits.'),
> ('english', 'How can a clam cram in a clean cream can?'),
> ('english', 'Can you can a can as a canner can can a can?');
>
> UPDATE search SET fulltext = to_tsvector(language, content);
>
> To make sure I always search in the correct language I use these queries:
>
> SELECT FROM search WHERE fulltext @@ to_tsquery(language, 'shine');
> (1 row)
>

I don't think that what you want is possible to do in a clean way. It
can't know what the language for a row is until it finds the row, but it
can't find the row using an index until it runs to_tsquery, and it can't
run to_tsquery until it knows the language.

You would probably need to run the query once for each language, and filter
out the results which it found under the 'wrong' language.

SELECT * FROM search WHERE fulltext @@ to_tsquery('dutch', 'shine') and
language = 'dutch'::regconfig
union all
SELECT * FROM search WHERE fulltext @@ to_tsquery('english', 'shine') and
language = 'english'::regconfig;

If you have more than two languages, this will become tedious, and perhaps
non-performant. I am not aware of a more general solution, though.

It might help to build partial indexes on each language.

Cheers,

Jeff

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Scott Mead 2016-09-15 20:10:48 Re: Unable to create oracle_fdw (foreign data wrapper) extension
Previous Message John R Pierce 2016-09-15 19:29:32 Re: PgSQL versions supported on ubuntu 16 and debian 8