Re: Apparently useless bitmap scans

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Apparently useless bitmap scans
Date: 2007-05-09 15:26:03
Message-ID: 200705091726.03455.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Am Mittwoch, 9. Mai 2007 16:29 schrieb Alvaro Herrera:
> Peter Eisentraut wrote:
> > There's another odd thing about this plan from yesterday.
>
> Is this still 8.2.1? The logic to choose bitmap indexes was rewritten
> just before 8.2.4,

OK, upgrading to 8.2.4 fixes this odd plan choice. The query does run
a bit faster too, but the cost estimate has actually gone up!

8.2.1:

QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=87142.18..87366.58 rows=11220 width=184) (actual time=7883.541..8120.647 rows=35000 loops=1)
-> Sort (cost=87142.18..87170.23 rows=11220 width=184) (actual time=7883.471..7926.031 rows=35000 loops=1)
Sort Key: eh_subj.header_body
-> Hash Join (cost=46283.30..86387.42 rows=11220 width=184) (actual time=5140.182..7635.615 rows=35000 loops=1)
Hash Cond: (eh_subj.email_id = email.email_id)
-> Bitmap Heap Scan on email_header eh_subj (cost=11853.68..50142.87 rows=272434 width=104) (actual time=367.956..1719.736 rows=280989 loops=1)
Recheck Cond: ((mime_part_id = 0) AND (header_name = 'subject'::text))
-> BitmapAnd (cost=11853.68..11853.68 rows=27607 width=0) (actual time=326.507..326.507 rows=0 loops=1)
-> Bitmap Index Scan on idx__email_header__header_body_subject (cost=0.00..5836.24 rows=272434 width=0) (actual time=178.041..178.041 rows=280989 loops=1)
-> Bitmap Index Scan on idx__email_header__header_name (cost=0.00..5880.97 rows=281247 width=0) (actual time=114.574..114.574 rows=280989 loops=1)
Index Cond: (header_name = 'subject'::text)
-> Hash (cost=34291.87..34291.87 rows=11020 width=120) (actual time=4772.148..4772.148 rows=35000 loops=1)
-> Hash Join (cost=24164.59..34291.87 rows=11020 width=120) (actual time=3131.067..4706.997 rows=35000 loops=1)
Hash Cond: (mime_part.email_id = email.email_id)
-> Seq Scan on mime_part (cost=0.00..8355.81 rows=265804 width=12) (actual time=0.038..514.291 rows=267890 loops=1)
Filter: (mime_part_id = 0)
-> Hash (cost=24025.94..24025.94 rows=11092 width=112) (actual time=3130.982..3130.982 rows=35000 loops=1)
-> Hash Join (cost=22244.54..24025.94 rows=11092 width=112) (actual time=996.556..3069.280 rows=35000 loops=1)
Hash Cond: (eh_from.email_id = email.email_id)
-> Bitmap Heap Scan on email_header eh_from (cost=15576.58..16041.55 rows=107156 width=104) (actual time=569.762..1932.017 rows=280990 loops=1)
Recheck Cond: ((mime_part_id = 0) AND (header_name = 'from'::text))
-> BitmapAnd (cost=15576.58..15576.58 rows=160 width=0) (actual time=532.217..532.217 rows=0 loops=1)
-> Bitmap Index Scan on dummy_index (cost=0.00..3724.22 rows=107156 width=0) (actual time=116.386..116.386 rows=280990 loops=1)
-> Bitmap Index Scan on idx__email_header__from_local (cost=0.00..5779.24 rows=107156 width=0) (actual time=174.883..174.883 rows=280990 loops=1)
-> Bitmap Index Scan on dummy2_index (cost=0.00..5992.25 rows=107156 width=0) (actual time=173.575..173.575 rows=280990 loops=1)
-> Hash (cost=6321.79..6321.79 rows=27694 width=8) (actual time=426.739..426.739 rows=35000 loops=1)
-> Index Scan using idx__email__time on email (cost=0.00..6321.79 rows=27694 width=8) (actual time=50.000..375.021 rows=35000 loops=1)
Index Cond: (("time" >= '2007-05-05 17:01:59'::timestamp without time zone) AND ("time" < '2007-05-05 18:01:59'::timestamp without time zone))
Total runtime: 8160.442 ms

8.2.4:

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=100086.52..100658.46 rows=28597 width=182) (actual time=6063.766..6281.818 rows=35000 loops=1)
-> Sort (cost=100086.52..100158.01 rows=28597 width=182) (actual time=6063.697..6105.215 rows=35000 loops=1)
Sort Key: eh_subj.header_body
-> Hash Join (cost=36729.27..97969.83 rows=28597 width=182) (actual time=3690.316..5790.094 rows=35000 loops=1)
Hash Cond: (eh_subj.email_id = email.email_id)
-> Bitmap Heap Scan on email_header eh_subj (cost=5903.20..63844.68 rows=267832 width=103) (actual time=214.699..1564.804 rows=280989 loops=1)
Recheck Cond: ((mime_part_id = 0) AND (header_name = 'subject'::text))
-> Bitmap Index Scan on idx__email_header__header_body_subject (cost=0.00..5836.24 rows=267832 width=0) (actual time=172.188..172.188 rows=280989 loops=1)
-> Hash (cost=30468.98..30468.98 rows=28567 width=119) (actual time=3475.484..3475.484 rows=35000 loops=1)
-> Hash Join (cost=13773.73..30468.98 rows=28567 width=119) (actual time=1260.579..3409.443 rows=35000 loops=1)
Hash Cond: (eh_from.email_id = email.email_id)
-> Index Scan using dummy_index on email_header eh_from (cost=0.00..13286.00 rows=277652 width=103) (actual time=0.076..1391.974 rows=280990 loops=1)
-> Hash (cost=13429.63..13429.63 rows=27528 width=20) (actual time=1260.422..1260.422 rows=35000 loops=1)
-> Hash Join (cost=1799.41..13429.63 rows=27528 width=20) (actual time=114.765..1206.500 rows=35000 loops=1)
Hash Cond: (mime_part.email_id = email.email_id)
-> Seq Scan on mime_part (cost=0.00..8355.81 rows=266589 width=12) (actual time=0.036..407.539 rows=267890 loops=1)
Filter: (mime_part_id = 0)
-> Hash (cost=1454.07..1454.07 rows=27627 width=8) (actual time=114.644..114.644 rows=35000 loops=1)
-> Index Scan using idx__email__time on email (cost=0.00..1454.07 rows=27627 width=8) (actual time=0.144..63.017 rows=35000 loops=1)
Index Cond: (("time" >= '2007-05-05 17:01:59'::timestamp without time zone) AND ("time" < '2007-05-05 18:01:59'::timestamp without time zone))
Total runtime: 6320.790 ms
(21 Zeilen)

The only significant change is that the first Bitmap Heap Scan (line 6)
became more expensive. You will notice that in the old plan, you had a
pretty good correspondence of 10 cost units to 1 millisecond throughout,
whereas in the new plan that does not apply to said Bitmap Heap Scan.
I'm not sure whether that is cause for concern.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2007-05-09 15:56:13 Re: Apparently useless bitmap scans
Previous Message Daniel Cristian Cruz 2007-05-09 14:51:58 Re: Nested loops overpriced