Re: [PERFORM] Slow TSearch2 performance for table with 1 million documents.

From: Benjamin Arai <benjamin(at)araisoft(dot)com>
To: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL <pgsql-general(at)postgresql(dot)org>, pgsql-performance(at)postgresql(dot)org
Subject: Re: [PERFORM] Slow TSearch2 performance for table with 1 million documents.
Date: 2007-10-05 22:57:31
Message-ID: 6936504E-3560-4BE9-86BE-BEDAB3CD355A@araisoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-performance


On Oct 5, 2007, at 8:32 AM, Oleg Bartunov wrote:

> On Fri, 5 Oct 2007, Tom Lane wrote:
>
>> Benjamin Arai <benjamin(at)araisoft(dot)com> writes:
>>> # explain analyze select * FROM fulltext_article, to_tsquery
>>> ('simple','dog') AS q WHERE idxfti @@ q ORDER BY rank(idxfti, q)
>>> DESC;
>>
>>> QUERY PLAN
>>> --------------------------------------------------------------------
>>> ----
>>> --------------------------------------------------------------------
>>> ----
>>> ------------
>>> Sort (cost=6576.74..6579.07 rows=933 width=774) (actual
>>> time=12969.237..12970.490 rows=5119 loops=1)
>>> Sort Key: rank(fulltext_article.idxfti, q.q)
>>> -> Nested Loop (cost=3069.79..6530.71 rows=933 width=774)
>>> (actual time=209.513..12955.498 rows=5119 loops=1)
>>> -> Function Scan on q (cost=0.00..0.01 rows=1 width=32)
>>> (actual time=0.005..0.006 rows=1 loops=1)
>>> -> Bitmap Heap Scan on fulltext_article
>>> (cost=3069.79..6516.70 rows=933 width=742) (actual
>>> time=209.322..234.390 rows=5119 loops=1)
>>> Recheck Cond: (fulltext_article.idxfti @@ q.q)
>>> -> Bitmap Index Scan on fulltext_article_idxfti_idx
>>> (cost=0.00..3069.56 rows=933 width=0) (actual time=208.373..208.373
>>> rows=5119 loops=1)
>>> Index Cond: (fulltext_article.idxfti @@ q.q)
>>> Total runtime: 12973.035 ms
>>> (9 rows)
>>
>> The time seems all spent at the join step, which is odd because it
>> really hasn't got much to do. AFAICS all it has to do is compute the
>> rank() values that the sort step will use. Is it possible that
>> rank() is really slow?
>
> can you try rank_cd() instead ?
>
Using Rank:

-# ('simple','dog') AS q WHERE idxfti @@ q ORDER BY rank(idxfti, q)
DESC;

QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
------------
Sort (cost=6576.74..6579.07 rows=933 width=774) (actual
time=98083.081..98084.351 rows=5119 loops=1)
Sort Key: rank(fulltext_article.idxfti, q.q)
-> Nested Loop (cost=3069.79..6530.71 rows=933 width=774)
(actual time=479.122..98067.594 rows=5119 loops=1)
-> Function Scan on q (cost=0.00..0.01 rows=1 width=32)
(actual time=0.003..0.004 rows=1 loops=1)
-> Bitmap Heap Scan on fulltext_article
(cost=3069.79..6516.70 rows=933 width=742) (actual
time=341.739..37112.110 rows=5119 loops=1)
Recheck Cond: (fulltext_article.idxfti @@ q.q)
-> Bitmap Index Scan on fulltext_article_idxfti_idx
(cost=0.00..3069.56 rows=933 width=0) (actual time=321.443..321.443
rows=5119 loops=1)
Index Cond: (fulltext_article.idxfti @@ q.q)
Total runtime: 98087.575 ms
(9 rows)

Using Rank_cd:

# explain analyze select * FROM fulltext_article, to_tsquery
('simple','cat') AS q WHERE idxfti @@ q ORDER BY rank_cd(idxfti, q)
DESC;

QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
-------------
Sort (cost=6576.74..6579.07 rows=933 width=774) (actual
time=199316.648..199324.631 rows=26054 loops=1)
Sort Key: rank_cd(fulltext_article.idxfti, q.q)
-> Nested Loop (cost=3069.79..6530.71 rows=933 width=774)
(actual time=871.428..199244.330 rows=26054 loops=1)
-> Function Scan on q (cost=0.00..0.01 rows=1 width=32)
(actual time=0.006..0.007 rows=1 loops=1)
-> Bitmap Heap Scan on fulltext_article
(cost=3069.79..6516.70 rows=933 width=742) (actual
time=850.674..50146.477 rows=26054 loops=1)
Recheck Cond: (fulltext_article.idxfti @@ q.q)
-> Bitmap Index Scan on fulltext_article_idxfti_idx
(cost=0.00..3069.56 rows=933 width=0) (actual time=838.120..838.120
rows=26054 loops=1)
Index Cond: (fulltext_article.idxfti @@ q.q)
Total runtime: 199338.297 ms
(9 rows)

>
>>
>> regards, tom lane
>>
>> ---------------------------(end of
>> broadcast)---------------------------
>> TIP 5: don't forget to increase your free space map settings
>>
>
> Regards,
> Oleg
> _____________________________________________________________
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru)
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
> phone: +007(495)939-16-83, +007(495)939-23-83
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Laurent ROCHE 2007-10-06 00:25:16 Very asynchrnous replication system
Previous Message Aroon Pahwa 2007-10-05 21:33:20 valid query runs forever?

Browse pgsql-performance by date

  From Date Subject
Next Message Shane Ambler 2007-10-06 01:19:08 Re: Problems with + 1 million record table
Previous Message Joshua D. Drake 2007-10-05 17:45:46 Re: Problems with + 1 million record table