Apparently useless bitmap scans

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-performance(at)postgresql(dot)org
Subject: Apparently useless bitmap scans
Date: 2007-05-09 12:10:57
Message-ID: 200705091410.57312.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

There's another odd thing about this plan from yesterday.

Query:

SELECT
eh_subj.header_body AS subject,
count(distinct eh_from.header_body)
FROM
email JOIN mime_part USING (email_id)
JOIN email_header eh_subj USING (email_id, mime_part_id)
JOIN email_header eh_from USING (email_id, mime_part_id)
WHERE
eh_subj.header_name = 'subject'
AND eh_from.header_name = 'from'
AND mime_part_id = 0
AND (time >= timestamp '2007-05-05 17:01:59' AND time < timestamp '2007-05-05 17:01:59' + interval '60 min')
GROUP BY
eh_subj.header_body;

Plan:

QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=1920309.81..1920534.21 rows=11220 width=184) (actual time=5349.493..5587.536 rows=35000 loops=1)
-> Sort (cost=1920309.81..1920337.86 rows=11220 width=184) (actual time=5349.427..5392.110 rows=35000 loops=1)
Sort Key: eh_subj.header_body
-> Nested Loop (cost=15576.58..1919555.05 rows=11220 width=184) (actual time=537.938..5094.377 rows=35000 loops=1)
-> Nested Loop (cost=15576.58..475387.23 rows=11020 width=120) (actual time=537.858..4404.330 rows=35000 loops=1)
-> Nested Loop (cost=15576.58..430265.44 rows=11092 width=112) (actual time=537.768..4024.184 rows=35000 loops=1)
-> Bitmap Heap Scan on email_header eh_from (cost=15576.58..16041.55 rows=107156 width=104) (actual time=537.621..1801.032 rows=280990 loops=1)
Recheck Cond: ((mime_part_id = 0) AND (header_name = 'from'::text))
-> BitmapAnd (cost=15576.58..15576.58 rows=160 width=0) (actual time=500.006..500.006 rows=0 loops=1)
-> Bitmap Index Scan on dummy_index (cost=0.00..3724.22 rows=107156 width=0) (actual time=85.025..85.025 rows=280990 loops=1)
-> Bitmap Index Scan on idx__email_header__from_local (cost=0.00..5779.24 rows=107156 width=0) (actual time=173.006..173.006 rows=280990 loops=1)
-> Bitmap Index Scan on dummy2_index (cost=0.00..5992.25 rows=107156 width=0) (actual time=174.463..174.463 rows=280990 loops=1)
-> Index Scan using email_pkey on email (cost=0.00..3.85 rows=1 width=8) (actual time=0.005..0.005 rows=0 loops=280990)
Index Cond: (email.email_id = eh_from.email_id)
Filter: (("time" >= '2007-05-05 17:01:59'::timestamp without time zone) AND ("time" < '2007-05-05 18:01:59'::timestamp without time zone))
-> Index Scan using mime_part_pkey on mime_part (cost=0.00..4.06 rows=1 width=12) (actual time=0.005..0.006 rows=1 loops=35000)
Index Cond: ((email.email_id = mime_part.email_id) AND (mime_part.mime_part_id = 0))
-> Index Scan using idx__email_header__email_id__mime_part_id on email_header eh_subj (cost=0.00..130.89 rows=13 width=104) (actual time=0.009..0.015 rows=1 loops=35000)
Index Cond: ((email.email_id = eh_subj.email_id) AND (0 = eh_subj.mime_part_id))
Filter: (header_name = 'subject'::text)
Total runtime: 5625.024 ms

I'm wondering what it wants to achieve with these three index scans:

-> Bitmap Index Scan on dummy_index (cost=0.00..3724.22 rows=107156 width=0) (actual time=85.025..85.025 rows=280990 loops=1)
-> Bitmap Index Scan on idx__email_header__from_local (cost=0.00..5779.24 rows=107156 width=0) (actual time=173.006..173.006 rows=280990 loops=1)
-> Bitmap Index Scan on dummy2_index (cost=0.00..5992.25 rows=107156 width=0) (actual time=174.463..174.463 rows=280990 loops=1)

The indexes in question are:

CREATE INDEX dummy_index ON email_header ((555)) WHERE mime_part_id = 0 AND header_name = 'from';
CREATE INDEX dummy2_index ON email_header (substr(header_body,5)) WHERE mime_part_id = 0 AND header_name = 'from';
CREATE INDEX idx__email_header__from_local ON email_header (get_localpart(header_body)) WHERE mime_part_id = 0 AND header_name = 'from';

It appears to want to use these indexes to get the restriction

AND eh_from.header_name = 'from'
AND mime_part_id = 0

from the query, but why does it need three of them to do it, when all
of them have the same predicate and none of them has an indexed
expression that appears in the query?

There are more partial indexes with the same predicate, but it appears
to always use three. (The two "dummy" indexes are just leftovers from
these experiments.)

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Valentine Gogichashvili 2007-05-09 13:12:45 Cannot make GIN intarray index be used by the planner
Previous Message Peter Eisentraut 2007-05-09 11:58:25 Re: Nested loops overpriced