From: | Nicolas Grilly <nicolas(at)gardentechno(dot)com> |
---|---|
To: | Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: ts_rank seems very slow (140 ranked documents / second on my machine) |
Date: | 2011-07-13 08:45:13 |
Message-ID: | CAG3yVS5ZedzuxorOaDVhuMC1tW-FoZ+bf7r1+p3A7UHaCsbBag@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hi Oleg and all,
On Wed, Jul 13, 2011 at 08:16, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> wrote:
> there is problem with estimating of cost scanning gin index in < 9.1
> versions,
> so you can set enable_seqscan=off;
> or try 9.1 which beta3 now.
I re-ran my queries using enable seqscan=off.
Now the first query, without ts_rank, uses the GIN index:
set enable_seqscan=off;
explain analyze select id
from posts_1000000
where to_tsquery('english', 'crare') @@ document_vector
limit 50;
Limit (cost=42290.12..42306.31 rows=50 width=4) (actual
time=16.259..16.412 rows=50 loops=1)
-> Bitmap Heap Scan on posts_1000000 (cost=42290.12..57877.02
rows=48152 width=4) (actual time=16.256..16.344 rows=50 loops=1)
Recheck Cond: ('''crare'''::tsquery @@ document_vector)
-> Bitmap Index Scan on index_posts_documents_1000000
(cost=0.00..42278.08 rows=48152 width=0) (actual time=13.265..13.265
rows=49951 loops=1)
Index Cond: ('''crare'''::tsquery @@ document_vector)
Total runtime: 16.484 ms
But the second query, the one that uses ts_rank, is still very slow...
Any idea why? Is ts_rank efficient enough to find the best 50 matches
among 50 000 documents?
set enable_seqscan=off;
explain analyze select id
from posts_1000000
where to_tsquery('english', 'crare') @@ document_vector
order by ts_rank_cd(document_vector, to_tsquery('english', 'crare'), 32) desc
limit 50;
Limit (cost=59596.98..59597.10 rows=50 width=22) (actual
time=296212.052..296212.257 rows=50 loops=1)
-> Sort (cost=59596.98..59717.36 rows=48152 width=22) (actual
time=296186.928..296187.007 rows=50 loops=1)"
Sort Key: (ts_rank_cd(document_vector, '''crare'''::tsquery, 32))
Sort Method: top-N heapsort Memory: 27kB
-> Bitmap Heap Scan on posts_1000000
(cost=42290.12..57997.40 rows=48152 width=22) (actual
time=70.861..296059.515 rows=49951 loops=1)
Recheck Cond: ('''crare'''::tsquery @@ document_vector)
-> Bitmap Index Scan on index_posts_documents_1000000
(cost=0.00..42278.08 rows=48152 width=0) (actual time=24.922..24.922
rows=49951 loops=1)
Index Cond: ('''crare'''::tsquery @@ document_vector)
Total runtime: 296220.493 ms
>> By the way, does ts_rank is supposed to use a GIN index when it's
>> available?
>
> no, I see no benefit :)
Ok. But what is the solution to improve ts_rank execution time? Am I
doing something wrong?
Thanks for your help,
Nicolas
From | Date | Subject | |
---|---|---|---|
Next Message | Andrus | 2011-07-13 10:51:45 | Server stops responding randomly for 5 minutes |
Previous Message | Giuseppe Sacco | 2011-07-13 07:49:00 | About permissions on large objects |