From: | Shane Hathaway <shane(at)hathawaymix(dot)org> |
---|---|
To: | François Beausoleil <francois(at)teksol(dot)info> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Ways to speed up ts_rank |
Date: | 2012-10-10 18:25:54 |
Message-ID: | 5075BDB2.2030507@hathawaymix.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On 10/10/2012 06:38 AM, François Beausoleil wrote:
> Notice the ts_rank() is on the outer query, which means it'll only
> run on the subset of documents which match the query. This is
> explicitly mentioned in the docs:
>
> """Ranking can be expensive since it requires consulting the tsvector
> of each matching document, which can be I/O bound and therefore slow.
> Unfortunately, it is almost impossible to avoid since practical
> queries often result in large numbers of matches."""
>
> (last paragraph of)
> http://www.postgresql.org/docs/current/static/textsearch-controls.html#TEXTSEARCH-RANKING
Indeed, I have studied that paragraph in depth, trying to gather as much
possible meaning from it as I can. :-)
However, the following two queries take exactly the same time,
suggesting to me that ts_rank_cd is really only looking at matching
rows, not all rows:
SELECT docid, coefficient * ts_rank_cd('{0.1, 0.2, 0.5, 1.0}',
text_vector, to_tsquery('english', 'stuff')) AS rank
FROM pgtextindex
WHERE (text_vector @@ to_tsquery('english', 'stuff'))
ORDER BY rank DESC
limit 3;
SELECT docid, coefficient * ts_rank_cd('{0.1, 0.2, 0.5, 1.0}',
text_vector, to_tsquery('english', 'stuff')) AS rank
FROM (SELECT * FROM pgtextindex
WHERE (text_vector @@ to_tsquery('english', 'stuff'))) AS filtered
ORDER BY rank DESC
limit 3;
Thanks for the suggestion though. By the way, all the tsvectors are
already loaded into the kernel cache when I execute the queries, so
ranking large documents is in fact CPU bound rather than I/O bound. The
CPU is pegged for the whole time.
Shane
From | Date | Subject | |
---|---|---|---|
Next Message | Shane Hathaway | 2012-10-10 18:32:55 | Re: Ways to speed up ts_rank |
Previous Message | Claudio Freire | 2012-10-10 18:24:42 | Re: shared_buffers/effective_cache_size on 96GB server |