From: | Steve Singer <ssinger(at)ca(dot)afilias(dot)info> |
---|---|
To: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
Cc: | "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: slow bitmap heap scans on pg 9.2 |
Date: | 2013-04-13 16:14:10 |
Message-ID: | 51698452.1050202@ca.afilias.info |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On 13-04-12 09:20 PM, Jeff Janes wrote:
> On Thursday, April 11, 2013, Steve Singer wrote:
>
>
> I think the reason why it is picking the hash join based plans is
> because of
>
> Index Scan using table_b_1_ptid_orgid_ym_unq on table_b_1 b
> (cost=0.00..503.86 rows=1 width=10) (actual time=0.016..0.017 rows=1
> loops=414249)
> Index Cond: ((a.id <http://a.id> = a_id) AND
> (organization_id = 2) AND (year = 2013) AND (month = 3))
> Filter: (product_id = 1)
>
>
>
> Trying to reason about how the planner estimates costs for the inner
> side of nested loops makes my head hurt.
> So before doing that, could you run explain (analyze,buffers) on both of
> these much simpler (but hopefully morally equivalent to this planner
> node) sql:
>
> select * from table_b_1_b where a_id = <some plausible value> and
> organization_id=2 and year=2013 and month=3
>
> select * from table_b_1_b where a_id = <some plausible value> and
> organization_id=2 and year=2013 and month=3 and product_id=1
>
table_b_1 is a partition of table_b on product_id so when querying
table table_b_1 directly they are equivalent
explain (analyze,buffers) select * FROM table_b_1 where a_id=1128944 and
organization_id=2 and year=2013 and month=3;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
-----------
Index Scan using table_b_1_ptid_orgid_ym_unq on table_b_1
(cost=0.00..50.73 rows=1 width=56) (actual time=60.328..60.330 rows=
1 loops=1)
Index Cond: ((a_id = 1128944) AND (organization_id = 2) AND (year =
2013) AND (month = 3))
Buffers: shared hit=1 read=5
Total runtime: 60.378 ms
(4 rows)
The plans are the same if I do or do not specify the product_id in the
where clause (if I query the parent table and neglect to query the query
clause it of course queries all the other partitions)
>
> Of particular interest here is whether the estimate of 1 row is due to
> the specificity of the filter, or if the index clauses alone are
> specific enough to drive that estimate. (If you get many rows without
> the product_id filter, that would explain the high estimate.).
The index clauses alone , we normally expect 1 row back for a query like
that.
>
> Please run with the default cost parameters, or if you can't get the
> right plan with the defaults, specify what the used parameters were.
indexTotalCost += index->pages * spc_random_page_cost / 100000.0;
Is driving my high costs on the inner loop. The index has 2-5 million
pages depending on the partition . If I run this against 9.2.2 with /
10000.0 the estimate is even higher.
If I try this with this with the
*indexTotalCost += log(1.0 + index->pages / 10000.0) * spc_random_page_cost;
from 9.3 and I play I can make this work I can it pick the plan on some
partitions with product_id=2 but not product_id=1. If I remove the
fudge-factor cost adjustment line I get the nested-loop plan always.
Breaking the index into smaller partial indexes for each year seems to
be giving me the plans I want with random_page_cost=2 (I might also try
partial indexes on the month).
Even with the 9.3 log based fudge-factor we are seeing the fudge-factor
being big enough so that the planner is picking a table scan over the
index. A lot of loop iterations can be satisfied by cached pages of the
index the fudge-factor doesn't really account for this.
>
> Cheers,
>
> Jeff
From | Date | Subject | |
---|---|---|---|
Next Message | Rodrigo Barboza | 2013-04-13 17:29:13 | Re: Segment best size |
Previous Message | Julian | 2013-04-13 15:36:48 | Re: limit is sometimes not pushed in view with order |