FTS query, statistics and planner estimations…

From: Pierre Ducroquet <pierre(dot)ducroquet(at)people-doc(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: FTS query, statistics and planner estimations…
Date: 2016-11-09 09:22:53
Message-ID: 3135605.Z1OdYFN9ee@laptop-pierred
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello

I recently stumbled on a slow query in my database that showed an odd
behaviour related to the statistics of FTS queries.
The query does a few joins «after» running a FTS query on a main table.
The FTS query returns a few thousand rows, but the estimations are wrong,
leading the optimizer to terrible plans compared to what should happen, and
thus creates a far higher execution time.
I managed to isolate the odd behaviour in a single query, and I would like
your opinion about it.

I have modified the table name, columns and query to hide sensitive values,
but the issue remain the same. The table contains about 295,000 documents, and
all is running under PostgreSQL 9.5.

EXPLAIN ANALYZE
SELECT COUNT(*)
FROM documents
WHERE
to_tsvector('french', subject || ' ' || body) @@ plainto_tsquery('XXX');

Of course, there is an index on to_tsvector('french', subject || ' ' || body).

That query gives me the following results for several values of XXX :

Request | Estimated rows | Real rows
----------------------------------+----------------+-----------
'word1' | 38050 | 37500
'word1 word2' | 4680 | 32000
'word1 word2 word3' | 270 | 12300
'word1 word2 word3 word4' | 10 | 9930
'word1 word2 word3 word4 word5' | 1 | 9930

You can see that with more words in query, the estimation falls far behind
reality.

Is that a known limitation of the FTS indexing ? Am I missing something
obvious, or a poor configuration ?

Thanks a lot

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Francisco Olarte 2016-11-09 09:40:10 Re: [GENERAL] FTS query, statistics and planner estimations…
Previous Message david.turon 2016-11-09 07:20:00 Re: ENABLE ROW LEVEL SECURITY cause huge produce of checkpoints