From: | Max <mail(at)to-the-max(dot)net> |
---|---|
To: | Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: GiST index not used for ORDER BY? |
Date: | 2005-01-27 21:52:20 |
Message-ID: | 6.2.1.2.0.20050127222524.02f697b8@mx1.to-the-max.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hi,
At 09:54 PM 1/27/2005, you wrote:
>On Thu, 27 Jan 2005, Max wrote:
>>I'm setting up a simple search engine using Tsearch2.
>>The basic idea is: a user enters a search query and a maximum of 1000
>>results is returned, sorted by date, newest first.
>>
>>At the moment the table holding the searchable data has 1.1 million entries.
>>It works great when the search only produces a few hundred results.
>>However when people search on a common word with 10.000+ results, there's
>>a performance problem.
>>
>>CREATE TABLE posts_index
>>(
>>....
>> startdate INT NOT NULL,
>> idxFTI tsvector,
>>....
>>);
>>CREATE INDEX idxFTI_idx2 ON posts_index USING gist(idxFTI,(-startdate));
>I assume you already vacuum your db.
Yes, I did vacuum analyze it. And he does use the first part of the index
(idxFTI), just not the second part (-startdate).
> Hmm, seems you need to rewrite your query.
>EXPLAIN SELECT startdate, headline(subject,q) AS subject FROM ( SELECT
>startdate, subject from posts_index i,
> to_tsquery('default', '_SEARCH_TERM_') AS q WHERE idxfti @@ q ORDER BY
> (-i.startdate) LIMIT 1000) as foo;
>
>I bet your query will be much faster. In your query all founded tuples should
>be read from disk to calculate headline(), while in my query maximum 1000
>tuples will be read. So, performance gain could be noticeable, for example,
>if search returns 10,000 tuples, my query will be 10x faster than yours :)
>I think this is what you observed.
Thanks for your help, however headline() doesn't seem the problem.
Here's an EXPLAIN ANALYZE using your query and a common word as SEARCH_TERM:
------
QUERY PLAN
Subquery Scan foo (cost=5368809.49..5368824.49 rows=1000 width=181)
(actual time=363455.642..363510.277 rows=1000 loops=1)
-> Limit (cost=5368809.49..5368811.99 rows=1000 width=126) (actual
time=363454.387..363455.983 rows=1000 loops=1)
-> Sort (cost=5368809.49..5372006.34 rows=1278741 width=126)
(actual time=363454.380..363455.471 rows=1000 loops=1)
Sort Key: (- i.startdate)
-> Nested Loop (cost=0.00..5118844.92 rows=1278741
width=126) (actual time=0.140..354003.773 rows=343974 loops=1)
-> Function Scan on q (cost=0.00..12.50 rows=1000
width=32) (actual time=0.015..0.018 rows=1 loops=1)
-> Index Scan using idxfti_idx2 on posts_index
i (cost=0.00..5099.65 rows=1279 width=253) (actual time=0.111..353068.267
rows=343974 loops=1)
Index Cond: (i.idxfti @@ "outer".q)
Total runtime: 363571.960 ms
----
It still seems to rather sort 343.974 rows and take over 5 minutes to
complete, than use the index for the date. While searching on less common
words takes less than a second.Omitting headline() completely doesn't
changes anything either.
So it must be something else.
Regards,
Max
From | Date | Subject | |
---|---|---|---|
Next Message | William Yu | 2005-01-27 22:00:30 | Re: Splitting queries across servers |
Previous Message | Pavel Stehule | 2005-01-27 21:47:09 | Re: change encoding ? |