From: | gael(at)pilotsystems(dot)net ( Gaël Le Mignot) |
---|---|
To: | Guillaume Smet <guillaume(dot)smet(at)gmail(dot)com> |
Cc: | Gaël Le Mignot <gael(at)pilotsystems(dot)net>, pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Performance regression between 8.3 and 8.4 on heavy text indexing |
Date: | 2009-08-26 16:03:34 |
Message-ID: | plop87k50qtve1.fsf@aoskar.kilobug.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hello Guillaume!
Sun, 23 Aug 2009 14:49:05 +0200, you wrote:
> Hi Gaël,
> On Fri, Aug 21, 2009 at 3:37 PM, Gaël Le Mignot<gael(at)pilotsystems(dot)net> wrote:
>> With 8.3 ::
>>
>> Limit (cost=752.67..752.67 rows=1 width=24)
>> (11 rows)
>>
>> With 8.4 ::
>> (8 rows)
> Could you provide us the EXPLAIN *ANALYZE* output of both plans?
Sure, here it is :
With 8.3 ::
libearticles=> explain analyze SELECT classname, id FROM libeindex WHERE (classname = 'article' AND (source IN ('methode','nica') AND (keywords_tsv @@ plainto_tsquery('french', 'assassinat') AND fulltext_tsv @@ to_tsquery('french', 'claude & duviau')))) ORDER BY publicationDate DESC,pageNumber ASC LIMIT 50;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=760.74..760.74 rows=1 width=24) (actual time=449.057..449.080 rows=9 loops=1)
-> Sort (cost=760.74..760.74 rows=1 width=24) (actual time=449.053..449.061 rows=9 loops=1)
Sort Key: publicationdate, pagenumber
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on libeindex (cost=756.71..760.73 rows=1 width=24) (actual time=420.704..448.571 rows=9 loops=1)
Recheck Cond: ((keywords_tsv @@ '''assassinat'''::tsquery) AND (fulltext_tsv @@ '''claud'' & ''duviau'''::tsquery))
Filter: (((source)::text = ANY ('{methode,nica}'::text[])) AND ((classname)::text = 'article'::text))
-> BitmapAnd (cost=756.71..756.71 rows=1 width=0) (actual time=420.612..420.612 rows=0 loops=1)
-> Bitmap Index Scan on keywords_index (cost=0.00..48.96 rows=573 width=0) (actual time=129.338..129.338 rows=10225 loops=1)
Index Cond: (keywords_tsv @@ '''assassinat'''::tsquery)
-> Bitmap Index Scan on fulltext_index (cost=0.00..707.50 rows=573 width=0) (actual time=289.775..289.775 rows=14 loops=1)
Index Cond: (fulltext_tsv @@ '''claud'' & ''duviau'''::tsquery)
Total runtime: 471.905 ms
(13 rows)
With 8.4 ::
libebench=> explain analyze SELECT classname, id FROM libeindex WHERE (classname = 'article' AND (source IN ('methode','nica') AND (keywords_tsv @@ plainto_tsquery('french', 'assassinat') AND fulltext_tsv @@ to_tsquery('french', 'claude & duviau')))) ORDER BY publicationDate DESC,pageNumber ASC LIMIT 50;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=758.51..758.51 rows=1 width=24) (actual time=50816.635..50816.660 rows=9 loops=1)
-> Sort (cost=758.51..758.51 rows=1 width=24) (actual time=50816.628..50816.637 rows=9 loops=1)
Sort Key: publicationdate, pagenumber
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on libeindex (cost=14.03..758.50 rows=1 width=24) (actual time=8810.133..50816.484 rows=9 loops=1)
Recheck Cond: (keywords_tsv @@ '''assassinat'''::tsquery)
Filter: (((source)::text = ANY ('{methode,nica}'::text[])) AND (fulltext_tsv @@ '''claud'' & ''duviau'''::tsquery) AND ((classname)::text = 'article'::text))
-> Bitmap Index Scan on keywords_index (cost=0.00..14.03 rows=192 width=0) (actual time=158.563..158.563 rows=10222 loops=1)
Index Cond: (keywords_tsv @@ '''assassinat'''::tsquery)
Total runtime: 50817.040 ms
(10 rows)
So it seems it was quite wrong about estimated matching rows (192 predicted, 10222 reals).
>> From what I can see, one of the difference is that the estimates of
> the number of rows are / 3 for this part of the query:
> 8.3 -> Bitmap Index Scan on keywords_index (cost=0.00..48.97 rows=574 width=0)
> 8.4 -> Bitmap Index Scan on keywords_index (cost=0.00..14.03 rows=192 width=0)
> It might be interesting to see if 8.4 is right or not.
> Before 8.4, the selectivity for full text search was a constant (as
> you can see it in your 8.3 plan: the number of rows are equal in both
> bitmap index scan). 8.4 is smarter which might lead to other plans.
I see, thanks for your answer. What's weird is that this "smartness"
leads to overall worse results in our case, is there some tweaking we
can do? I didn't see anything in the documentation to change
weighting inside the text-match heuristic.
--
Gaël Le Mignot - gael(at)pilotsystems(dot)net
Pilot Systems - 9, rue Desargues - 75011 Paris
Tel : +33 1 44 53 05 55 - www.pilotsystems.net
Gérez vos contacts et vos newsletters : www.cockpit-mailing.com
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2009-08-26 16:29:45 | Re: Performance regression between 8.3 and 8.4 on heavy text indexing |
Previous Message | Fred Janon | 2009-08-26 15:08:52 | Re: How to create a multi-column index with 2 dates using 'gist'? |