Re: GiST index not used for ORDER BY?

From: Max <mail(at)to-the-max(dot)net>
To: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: GiST index not used for ORDER BY?
Date: 2005-01-27 21:52:20
Message-ID: 6.2.1.2.0.20050127222524.02f697b8@mx1.to-the-max.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

At 09:54 PM 1/27/2005, you wrote:
>On Thu, 27 Jan 2005, Max wrote:
>>I'm setting up a simple search engine using Tsearch2.
>>The basic idea is: a user enters a search query and a maximum of 1000
>>results is returned, sorted by date, newest first.
>>
>>At the moment the table holding the searchable data has 1.1 million entries.
>>It works great when the search only produces a few hundred results.
>>However when people search on a common word with 10.000+ results, there's
>>a performance problem.
>>
>>CREATE TABLE posts_index
>>(
>>....
>> startdate INT NOT NULL,
>> idxFTI tsvector,
>>....
>>);
>>CREATE INDEX idxFTI_idx2 ON posts_index USING gist(idxFTI,(-startdate));

>I assume you already vacuum your db.

Yes, I did vacuum analyze it. And he does use the first part of the index
(idxFTI), just not the second part (-startdate).

> Hmm, seems you need to rewrite your query.
>EXPLAIN SELECT startdate, headline(subject,q) AS subject FROM ( SELECT
>startdate, subject from posts_index i,
> to_tsquery('default', '_SEARCH_TERM_') AS q WHERE idxfti @@ q ORDER BY
> (-i.startdate) LIMIT 1000) as foo;
>
>I bet your query will be much faster. In your query all founded tuples should
>be read from disk to calculate headline(), while in my query maximum 1000
>tuples will be read. So, performance gain could be noticeable, for example,
>if search returns 10,000 tuples, my query will be 10x faster than yours :)
>I think this is what you observed.

Thanks for your help, however headline() doesn't seem the problem.
Here's an EXPLAIN ANALYZE using your query and a common word as SEARCH_TERM:

------
QUERY PLAN

Subquery Scan foo (cost=5368809.49..5368824.49 rows=1000 width=181)
(actual time=363455.642..363510.277 rows=1000 loops=1)
-> Limit (cost=5368809.49..5368811.99 rows=1000 width=126) (actual
time=363454.387..363455.983 rows=1000 loops=1)
-> Sort (cost=5368809.49..5372006.34 rows=1278741 width=126)
(actual time=363454.380..363455.471 rows=1000 loops=1)
Sort Key: (- i.startdate)
-> Nested Loop (cost=0.00..5118844.92 rows=1278741
width=126) (actual time=0.140..354003.773 rows=343974 loops=1)
-> Function Scan on q (cost=0.00..12.50 rows=1000
width=32) (actual time=0.015..0.018 rows=1 loops=1)
-> Index Scan using idxfti_idx2 on posts_index
i (cost=0.00..5099.65 rows=1279 width=253) (actual time=0.111..353068.267
rows=343974 loops=1)
Index Cond: (i.idxfti @@ "outer".q)
Total runtime: 363571.960 ms
----

It still seems to rather sort 343.974 rows and take over 5 minutes to
complete, than use the index for the date. While searching on less common
words takes less than a second.Omitting headline() completely doesn't
changes anything either.
So it must be something else.

Regards,
Max

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message William Yu 2005-01-27 22:00:30 Re: Splitting queries across servers
Previous Message Pavel Stehule 2005-01-27 21:47:09 Re: change encoding ?