Re: PostgreSQL GIN index not used when ts_query language is fetched from a column

From: Jaap Roes <jroes(at)leukeleu(dot)nl>
To: Jaap Roes <jroes(at)leukeleu(dot)nl>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: PostgreSQL GIN index not used when ts_query language is fetched from a column
Date: 2016-09-15 19:13:40
Message-ID: C8E95F2B-BF9E-47A3-9965-7750043C9822@leukeleu.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


> On 15 Sep 2016, at 14:46, Jaap Roes <jroes(at)leukeleu(dot)nl> wrote:
>
> I've got a table that stores some multilingual content:
>
> CREATE TABLE search (
> content text NOT NULL,
> language regconfig NOT NULL,
> fulltext tsvector
> );
> CREATE INDEX search_fulltext ON search USING GIN(fulltext);
...
> To make sure I always search in the correct language I use these queries:
>
> SELECT FROM search WHERE fulltext @@ to_tsquery(language, 'shine');
> (1 row)
>
> SELECT FROM search WHERE fulltext @@ to_tsquery(language, 'vlieg');
> (1 row)
>
> Because hardcoding the language doesn't give the correct results:
>
> SELECT FROM search WHERE fulltext @@ to_tsquery('dutch', 'shine');
> (0 rows)
>
> SELECT FROM search WHERE fulltext @@ to_tsquery('english', 'vlieg');
> (0 rows)
>
> The problem however is that PostgreSQL doesn't use the GIN index when using the first set of queries and instead does a sequential scan:
...
> EXPLAIN ANALYZE SELECT * FROM search WHERE fulltext @@ to_tsquery(language, 'shine’);
>
> Seq Scan on search (cost=0.00..17.35 rows=2 width=136) (actual time=0.040..0.044 rows=1 loops=1)
> Filter: (fulltext @@ to_tsquery(language, 'shine'::text))
> Rows Removed by Filter: 5
> Planning time: 0.039 ms
> Execution time: 0.064 ms
>
> While it does when hardcoding a language:
>
> EXPLAIN ANALYZE SELECT FROM search WHERE fulltext @@ to_tsquery('dutch', 'vlieg');
> Bitmap Heap Scan on search (cost=12.63..23.66 rows=82 width=0) (actual time=0.044..0.044 rows=1 loops=1)
> Recheck Cond: (fulltext @@ '''vlieg'''::tsquery)
> Heap Blocks: exact=1
> -> Bitmap Index Scan on search_fulltext (cost=0.00..12.61 rows=82 width=0) (actual time=0.037..0.037 rows=1 loops=1)
> Index Cond: (fulltext @@ '''vlieg'''::tsquery)
> Planning time: 0.128 ms
> Execution time: 0.065 ms
>
> So my question is: Is it at all possible to use a column in the ts_query to use the correct language config and still have Postgres use the GIN index?

So I came up with a solution. Pre-localise the query in a join and use that to filter the matches:

SELECT * FROM search s
INNER JOIN (
SELECT 'dutch'::regconfig AS language, to_tsquery('dutch', 'shine') as q
UNION SELECT 'english'::regconfig AS language, to_tsquery('english', 'shine') as q
UNION SELECT 'simple'::regconfig AS language, to_tsquery('simple', 'shine') as q
) q ON (s.language=q.language)
WHERE fulltext @@ q;

This seems to work, but the query plan looks a bit confusing, so I’m not super confident about the correctness:

Nested Loop (cost=205.44..1327.12 rows=188 width=1590) (actual time=3.350..7.010 rows=16 loops=1)
-> Unique (cost=0.08..0.11 rows=3 width=0) (actual time=0.010..0.021 rows=3 loops=1)
-> Sort (cost=0.08..0.09 rows=3 width=0) (actual time=0.008..0.011 rows=3 loops=1)
Sort Key: ('dutch'::regconfig), ('''vlieg'''::tsquery)
Sort Method: quicksort Memory: 25kB
-> Append (cost=0.00..0.06 rows=3 width=0) (actual time=0.001..0.002 rows=3 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.000 rows=1 loops=1)
-> Bitmap Heap Scan on search s (cost=205.36..441.70 rows=63 width=1554) (actual time=2.310..2.318 rows=5 loops=3)
Recheck Cond: ((fulltext @@ ('''vlieg'''::tsquery)) AND ((language)::oid = (('dutch'::regconfig))::oid))
Heap Blocks: exact=16
-> BitmapAnd (cost=205.36..205.36 rows=63 width=0) (actual time=2.303..2.303 rows=0 loops=3)
-> Bitmap Index Scan on search_fulltext (cost=0.00..17.41 rows=188 width=0) (actual time=0.018..0.018 rows=16 loops=3)
Index Cond: (fulltext @@ ('''vlieg'''::tsquery))
-> Bitmap Index Scan on search_language (cost=0.00..187.67 rows=12539 width=0) (actual time=2.277..2.277 rows=12539 loops=3)
Index Cond: ((language_config)::oid = (('dutch'::regconfig))::oid)
Planning time: 0.228 ms
Execution time: 7.058 ms

Is this the way to go? Or is there a better way, I’m eager to find out!

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Arun Rangarajan 2016-09-15 19:25:44 Unable to create oracle_fdw (foreign data wrapper) extension
Previous Message Leonardo M. Ramé 2016-09-15 15:14:05 Re: Test letter