Re: Performance regression between 8.3 and 8.4 on heavy text indexing

From: gael(at)pilotsystems(dot)net ( Gaël Le Mignot)
To: Guillaume Smet <guillaume(dot)smet(at)gmail(dot)com>
Cc: Gaël Le Mignot <gael(at)pilotsystems(dot)net>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance regression between 8.3 and 8.4 on heavy text indexing
Date: 2009-08-26 16:03:34
Message-ID: plop87k50qtve1.fsf@aoskar.kilobug.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello Guillaume!

Sun, 23 Aug 2009 14:49:05 +0200, you wrote:

> Hi Gaël,
> On Fri, Aug 21, 2009 at 3:37 PM, Gaël Le Mignot<gael(at)pilotsystems(dot)net> wrote:
>> With 8.3 ::
>>
>>  Limit  (cost=752.67..752.67 rows=1 width=24)
>>  (11 rows)
>>
>> With 8.4 ::
>>  (8 rows)

> Could you provide us the EXPLAIN *ANALYZE* output of both plans?

Sure, here it is :

With 8.3 ::

libearticles=> explain analyze SELECT classname, id FROM libeindex WHERE (classname = 'article' AND (source IN ('methode','nica') AND (keywords_tsv @@ plainto_tsquery('french', 'assassinat') AND fulltext_tsv @@ to_tsquery('french', 'claude & duviau')))) ORDER BY publicationDate DESC,pageNumber ASC LIMIT 50;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=760.74..760.74 rows=1 width=24) (actual time=449.057..449.080 rows=9 loops=1)
-> Sort (cost=760.74..760.74 rows=1 width=24) (actual time=449.053..449.061 rows=9 loops=1)
Sort Key: publicationdate, pagenumber
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on libeindex (cost=756.71..760.73 rows=1 width=24) (actual time=420.704..448.571 rows=9 loops=1)
Recheck Cond: ((keywords_tsv @@ '''assassinat'''::tsquery) AND (fulltext_tsv @@ '''claud'' & ''duviau'''::tsquery))
Filter: (((source)::text = ANY ('{methode,nica}'::text[])) AND ((classname)::text = 'article'::text))
-> BitmapAnd (cost=756.71..756.71 rows=1 width=0) (actual time=420.612..420.612 rows=0 loops=1)
-> Bitmap Index Scan on keywords_index (cost=0.00..48.96 rows=573 width=0) (actual time=129.338..129.338 rows=10225 loops=1)
Index Cond: (keywords_tsv @@ '''assassinat'''::tsquery)
-> Bitmap Index Scan on fulltext_index (cost=0.00..707.50 rows=573 width=0) (actual time=289.775..289.775 rows=14 loops=1)
Index Cond: (fulltext_tsv @@ '''claud'' & ''duviau'''::tsquery)
Total runtime: 471.905 ms
(13 rows)

With 8.4 ::

libebench=> explain analyze SELECT classname, id FROM libeindex WHERE (classname = 'article' AND (source IN ('methode','nica') AND (keywords_tsv @@ plainto_tsquery('french', 'assassinat') AND fulltext_tsv @@ to_tsquery('french', 'claude & duviau')))) ORDER BY publicationDate DESC,pageNumber ASC LIMIT 50;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=758.51..758.51 rows=1 width=24) (actual time=50816.635..50816.660 rows=9 loops=1)
-> Sort (cost=758.51..758.51 rows=1 width=24) (actual time=50816.628..50816.637 rows=9 loops=1)
Sort Key: publicationdate, pagenumber
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on libeindex (cost=14.03..758.50 rows=1 width=24) (actual time=8810.133..50816.484 rows=9 loops=1)
Recheck Cond: (keywords_tsv @@ '''assassinat'''::tsquery)
Filter: (((source)::text = ANY ('{methode,nica}'::text[])) AND (fulltext_tsv @@ '''claud'' & ''duviau'''::tsquery) AND ((classname)::text = 'article'::text))
-> Bitmap Index Scan on keywords_index (cost=0.00..14.03 rows=192 width=0) (actual time=158.563..158.563 rows=10222 loops=1)
Index Cond: (keywords_tsv @@ '''assassinat'''::tsquery)
Total runtime: 50817.040 ms
(10 rows)

So it seems it was quite wrong about estimated matching rows (192 predicted, 10222 reals).

>> From what I can see, one of the difference is that the estimates of
> the number of rows are / 3 for this part of the query:
> 8.3 -> Bitmap Index Scan on keywords_index (cost=0.00..48.97 rows=574 width=0)
> 8.4 -> Bitmap Index Scan on keywords_index (cost=0.00..14.03 rows=192 width=0)

> It might be interesting to see if 8.4 is right or not.

> Before 8.4, the selectivity for full text search was a constant (as
> you can see it in your 8.3 plan: the number of rows are equal in both
> bitmap index scan). 8.4 is smarter which might lead to other plans.

I see, thanks for your answer. What's weird is that this "smartness"
leads to overall worse results in our case, is there some tweaking we
can do? I didn't see anything in the documentation to change
weighting inside the text-match heuristic.

--
Gaël Le Mignot - gael(at)pilotsystems(dot)net
Pilot Systems - 9, rue Desargues - 75011 Paris
Tel : +33 1 44 53 05 55 - www.pilotsystems.net
Gérez vos contacts et vos newsletters : www.cockpit-mailing.com

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2009-08-26 16:29:45 Re: Performance regression between 8.3 and 8.4 on heavy text indexing
Previous Message Fred Janon 2009-08-26 15:08:52 Re: How to create a multi-column index with 2 dates using 'gist'?