From: | François Beausoleil <francois(at)teksol(dot)info> |
---|---|
To: | Shane Hathaway <shane(at)hathawaymix(dot)org> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Ways to speed up ts_rank |
Date: | 2012-10-10 12:38:28 |
Message-ID: | 546C964C-5BE7-464D-9EC6-5FC816AEE65E@teksol.info |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Le 2012-10-09 à 17:38, Shane Hathaway a écrit :
> Hello,
>
> The database has a text index of around 200,000 documents. Investigation revealed that text queries are slow only when using ts_rank or ts_rank_cd. Without a ts_rank function, any query is answered within 200ms or so; with ts_rank function, queries take up to 30 seconds. Deeper investigation using gprof showed that the problem is probably not ts_rank or ts_rank_cd, but the fact that those functions retrieve thousands of TOASTed tsvectors.
Is the query perhaps doing something like this:
SELECT ...
FROM table
WHERE tsvectorcol @@ plainto_tsquery('...')
ORDER BY ts_rank(...)
If so, ts_rank() is run for every document. What you should do instead is:
SELECT *
FROM (
SELECT ...
FROM table
WHERE tsvectorcol @@ plainto_tsquery('...')) AS t1
ORDER BY ts_rank(...)
Notice the ts_rank() is on the outer query, which means it'll only run on the subset of documents which match the query. This is explicitly mentioned in the docs:
"""Ranking can be expensive since it requires consulting the tsvector of each matching document, which can be I/O bound and therefore slow. Unfortunately, it is almost impossible to avoid since practical queries often result in large numbers of matches."""
(last paragraph of) http://www.postgresql.org/docs/current/static/textsearch-controls.html#TEXTSEARCH-RANKING
Hope that helps!
François Beausoleil
From | Date | Subject | |
---|---|---|---|
Next Message | Shaun Thomas | 2012-10-10 12:52:37 | Re: Hyperthreading (was: Two identical systems, radically different performance) |
Previous Message | Strahinja Kustudić | 2012-10-10 11:33:45 | Re: shared_buffers/effective_cache_size on 96GB server |