Quick Links

Full text search with ORDER BY performance issue

From:	Krade <krade(at)krade(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Full text search with ORDER BY performance issue
Date:	2009-07-18 22:07:56
Message-ID:	4A6247BC.2070108@krade.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Hello,

I'm having a bit of an issue with full text search (using tsvectors) on
PostgreSQL 8.4. I have a rather large table (around 12M rows) and want
to use full text search in it (just for one of the columns). Just doing
a plainto_tsquery works reasonably fast (I have a GIN index on the
column in question, "comment_tsv"), but it becomes unbearably slow when
I want to make it order by another field ("timestamp").

Here's an example query:
SELECT * FROM a WHERE comment_tsv @@ plainto_tsquery('love') ORDER BY
timestamp DESC LIMIT 24 OFFSET 0;

I asked in #postgresql and was told that there are two possible plans
for this query; the first scans the BTREE timestamp index, gets the
ordering and then filters out the rows using text search; the second
finds all rows matching the text search using the GIN index and then
sorts them according to that field -- this much I already knew, in fact,
I had to drop the BTREE index on "timestamp" to prevent the planner from
choosing the first, since the first plan is completely useless to me,
considering the table is so huge (suggestions on how to prevent the
planner from picking the "wrong" plan are also appreciated).

Obviously, this gets really slow when I try to search for common words
and full text search returns a lot of rows to be ordered.

I tried to make a GIN index on ("timestamp", "comment_tsv"), (using
btree_gin from contrib) but it doesn't really do anything -- I was told
on IRC this is because GIN doesn't provide support for ORDER BY, only
BTREE can do that.

Here's a couple of queries:

archive=> explain analyze select * from a where comment_tsv @@
plainto_tsquery('love') order by timestamp desc limit 24 offset 0;

QUERY PLAN
----------
Limit (cost=453248.73..453248.79 rows=24 width=281) (actual
time=188441.047..188441.148 rows=24 loops=1)
-> Sort (cost=453248.73..453882.82 rows=253635 width=281) (actual
time=188441.043..188441.079 rows=24 loops=1)
Sort Key: "timestamp"
Sort Method: top-N heapsort Memory: 42kB
-> Bitmap Heap Scan on a (cost=17782.16..446166.02
rows=253635 width=281) (actual time=2198.930..187948.050 rows=256378
loops=1)
Recheck Cond: (comment_tsv @@ plainto_tsquery('love'::text))
-> Bitmap Index Scan on timestamp_comment_gin
(cost=0.00..17718.75 rows=253635 width=0) (actual
time=2113.664..2113.664 rows=259828 loops=1)
Index Cond: (comment_tsv @@
plainto_tsquery('love'::text))
Total runtime: 188442.617 ms
(9 rows)

archive=> explain analyze select * from a where comment_tsv @@
plainto_tsquery('love') limit 24 offset 0;

QUERY PLAN
----------
Limit (cost=0.00..66.34 rows=24 width=281) (actual
time=14.632..53.647 rows=24 loops=1)
-> Seq Scan on a (cost=0.00..701071.49 rows=253635 width=281)
(actual time=14.629..53.588 rows=24 loops=1)
Filter: (comment_tsv @@ plainto_tsquery('love'::text))
Total runtime: 53.731 ms
(4 rows)

First one runs painfully slow.

Is there really no way to have efficient full text search results
ordered by a separate field? I'm really open to all possibilities, at
this point.

Thanks.

Responses

Re: Full text search with ORDER BY performance issue at 2009-07-20 12:12:20 from Oleg Bartunov
Re: Full text search with ORDER BY performance issue at 2009-07-20 12:22:03 from Marcin Stępnicki
Re: Full text search with ORDER BY performance issue at 2009-07-20 21:42:31 from Kevin Grittner
Re: Full text search with ORDER BY performance issue at 2009-07-21 01:13:43 from Devin Ben-Hur
Re: Full text search with ORDER BY performance issue at 2014-07-28 19:55:13 from worthy7

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Robert James	2009-07-19 23:03:09	Can Postgres use an INDEX over an OR?
Previous Message	Haszlakiewicz, Eric	2009-07-17 19:13:26	Re: [PERFORM] Concurrency issue under very heay loads