From: | gael(at)pilotsystems(dot)net ( Gaël Le Mignot) |
---|---|
To: | Guillaume Smet <guillaume(dot)smet(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gaël Le Mignot <gael(at)pilotsystems(dot)net>, pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Performance regression between 8.3 and 8.4 on heavy text indexing |
Date: | 2009-08-27 20:03:10 |
Message-ID: | plop87ocq1rpmp.fsf@aoskar.kilobug.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hello Guillaume!
Wed, 26 Aug 2009 23:59:25 +0200, you wrote:
> On Wed, Aug 26, 2009 at 6:29 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> gael(at)pilotsystems(dot)net (=?iso-8859-1?Q?Ga=EBl?= Le Mignot) writes:
>>> So it seems it was quite wrong about estimated matching rows (192 predicted, 10222 reals).
>>
>> Yup. What's even more interesting is that it seems the real win would
>> have been to use just the 'claude & duviau' condition (which apparently
>> matched only 14 rows). 8.3 had no hope whatever of understanding that,
>> it just got lucky. 8.4 should have figured it out, I'm thinking.
>> Does it help if you increase the statistics target for fulltext_tsv?
>> (Don't forget to re-ANALYZE after doing so.)
> It could be interesting to run the query without the condition
> (keywords_tsv @@ '''assassinat'''::tsquery) to see the estimate of
> (fulltext_tsv @@ '''claud'' & ''duviau'''::tsquery) in 8.4.
Here it is ::
libebench=> explain analyze SELECT classname, id FROM libeindex WHERE (classname = 'article' AND (source IN ('methode','nica') AND fulltext_tsv @@ to_tsquery('french', 'claude & duviau'))) ORDER BY publicationDate DESC,pageNumber ASC LIMIT 50;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=12264.98..12265.11 rows=50 width=24) (actual time=3.799..3.825 rows=10 loops=1)
-> Sort (cost=12264.98..12271.03 rows=2421 width=24) (actual time=3.794..3.802 rows=10 loops=1)
Sort Key: publicationdate, pagenumber
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on libeindex (cost=2363.10..12184.56 rows=2421 width=24) (actual time=3.579..3.693 rows=10 loops=1)
Recheck Cond: (fulltext_tsv @@ '''claud'' & ''duviau'''::tsquery)
Filter: (((source)::text = ANY ('{methode,nica}'::text[])) AND ((classname)::text = 'article'::text))
-> Bitmap Index Scan on fulltext_index (cost=0.00..2362.49 rows=2877 width=0) (actual time=3.499..3.499 rows=14 loops=1)
Index Cond: (fulltext_tsv @@ '''claud'' & ''duviau'''::tsquery)
Total runtime: 166.772 ms
(10 rows)
So it estimates 2877 rows for that, while in reality it's 14.
> Btw, what Tom means by increasing the statistics is executing the
> following queries:
> ALTER TABLE libeindex ALTER COLUMN fulltext_tsv SET STATISTICS 500;
Ok, I did it for 500 also on the keywords_tsv column, which was the
other contestor. Here we have a clear improvement: the search in
keyword_tsv is now estimated at 10398 (real being 10222) and the one
on fulltext_tsv at 1 (real being 14).
I did it at 1000 too, it's almost the same result.
By re-running our sampling of 7334 queries on the database with the
statistics at 1000 on both fulltext_tsv and keywords_tsv, we do have
overall better results than with 8.3 ! So a greeat thanks to everyone.
The weird thing was that with the default of 100 for statistics
target, it was worse than when we moved back to 10. So I didn't try
with 1000, but I should have.
I'll do more tests and keep the list informed if it can be of any
help.
--
Gaël Le Mignot - gael(at)pilotsystems(dot)net
Pilot Systems - 9, rue Desargues - 75011 Paris
Tel : +33 1 44 53 05 55 - www.pilotsystems.net
Gérez vos contacts et vos newsletters : www.cockpit-mailing.com
From | Date | Subject | |
---|---|---|---|
Next Message | Guillaume Smet | 2009-08-27 22:20:34 | Re: Performance regression between 8.3 and 8.4 on heavy text indexing |
Previous Message | Tom Lane | 2009-08-27 18:05:13 | Re: PostgreSQL does CAST implicitely between int and a domain derived from int |