Why does query planner choose slower BitmapAnd ?

From: Seamus Abshere <seamus(at)abshere(dot)net>
To: pgsql-general(at)postgresql(dot)org
Subject: Why does query planner choose slower BitmapAnd ?
Date: 2016-02-22 15:18:41
Message-ID: 1456154321.976561.528310154.6A623C0E@webmail.messagingengine.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

hi,

I don't understand why the query planner is choosing a BitmapAnd when an
Index Scan followed by a filter is obviously better.

(Note that "new_york_houses" is a view of table "houses" with one
condition on city - and there is an index idx_houses_city. That is the
Index Scan that I think it should use exclusively.)

Here's a fast query that uses the Index Scan followed by a filter:

> => explain analyze SELECT COUNT(DISTINCT id) FROM new_york_houses WHERE roof_area >= 0 AND roof_area < 278.7091;
> QUERY PLAN
> ---
> Aggregate (cost=167298.10..167298.11 rows=1 width=16) (actual time=141.137..141.137 rows=1 loops=1)
> -> Index Scan using idx_houses_city on households (cost=0.57..167178.87 rows=47694 width=16) (actual time=0.045..105.953 rows=53971 loops=1)
> Index Cond: (city = 'New York'::text)
> Filter: ((roof_area >= 0) AND ((roof_area)::numeric < 278.7091))
> Rows Removed by Filter: 101719
> Planning time: 0.688 ms
> Execution time: 141.250 ms
> (7 rows)

When I add another condition, "phoneable", however, it chooses an
obviously wrong plan:

> => explain analyze SELECT COUNT(DISTINCT id) FROM new_york_houses WHERE roof_area >= 0 AND roof_area < 278.7091 AND phoneable = true;
> QUERY PLAN
> ---
> Aggregate (cost=128163.05..128163.06 rows=1 width=16) (actual time=4564.677..4564.677 rows=1 loops=1)
> -> Bitmap Heap Scan on households (cost=105894.80..128147.78 rows=6106 width=16) (actual time=4456.690..4561.416 rows=5183 loops=1)
> Recheck Cond: (city = 'New York'::text)
> Filter: (phoneable AND (roof_area >= 0) AND ((roof_area)::numeric < 278.7091))
> Rows Removed by Filter: 40103
> Heap Blocks: exact=14563
> -> BitmapAnd (cost=105894.80..105894.80 rows=21002 width=0) (actual time=4453.510..4453.510 rows=0 loops=1)
> -> Bitmap Index Scan on idx_houses_city (cost=0.00..1666.90 rows=164044 width=0) (actual time=16.505..16.505 rows=155690 loops=1)
> Index Cond: (city = 'New York'::text)
> -> Bitmap Index Scan on idx_houses_phoneable (cost=0.00..104224.60 rows=10271471 width=0) (actual time=4384.461..4384.461 rows=10647041 loops=1)
> Index Cond: (phoneable = true)
> Planning time: 0.709 ms
> Execution time: 4565.067 ms
> (13 rows)

On Postgres 9.4.4 with 244gb memory and SSDs

maintenance_work_mem 1000000
work_mem 500000
random_page_cost 1
seq_page_cost 2

The "houses" table has been analyzed recently and has statistics set to
the max.

Thanks,
Seamus

--
Seamus Abshere, SCEA
https://github.com/seamusabshere
https://www.linkedin.com/in/seamusabshere

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Victor Yegorov 2016-02-22 15:45:04 bpchar, text and indexes
Previous Message Stephen Frost 2016-02-22 14:58:08 Re: Why is my database so big?