From: | Srinivas Karthik V <skarthikv(dot)iitb(at)gmail(dot)com> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Effect of caching hash bucket size while costing |
Date: | 2016-12-08 08:53:27 |
Message-ID: | CAEfuzeSWU-V8WqnQ-uyAv85dgqvjCxoTOAoF46LEE+etTz2+rw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Dear PostgreSQL Hackers,
I am working in PostgreSQL 9.4.* optimizer module. In costsize.c file and
final_cost_hashjoin() function, the innerbucketsize is either:
a) calculated using a cached copy
OR
b) calculated afresh using statistics captured by the following code
snippet:
thisbucketsize = estimate_hash_bucketsize(root,
get_leftop(restrictinfo->clause),virtualbuckets);
For the query I used, if I disable the caching for calculating the
innerbucketsize, I get a different plan with cost change of around 1000
units.
1) Can you please let me know if innerbucketsize*innerpathrows captures the
maximum bucket size?
2) why is it not calculated afresh all the time?
For reference, below is the query I am using:
explain select i_item_id, avg(cs_quantity) , avg(cs_list_price) ,
avg(cs_coupon_amt) , avg(cs_sales_price) from catalog_sales,
customer_demographics, date_dim, item, promotion where cs_sold_date_sk =
d_date_sk and cs_item_sk = i_item_sk and cs_bill_cdemo_sk = cd_demo_sk
and cs_promo_sk = p_promo_sk and cd_gender = 'F' and cd_marital_status =
'U' and cd_education_status = 'Unknown' and (p_channel_email = 'N' or
p_channel_event = 'N') and d_year = 2002 and i_current_price <= 100 group
by i_item_id order by i_item_id
and the hashclause which was tried was (item.i_item_sk =
catalog_sales.cs_item_sk).
Thanks,
Srinivas Karthik
From | Date | Subject | |
---|---|---|---|
Next Message | Heikki Linnakangas | 2016-12-08 08:55:45 | Re: Password identifiers, protocol aging and SCRAM protocol |
Previous Message | Magnus Hagander | 2016-12-08 08:49:02 | Major service downtime expected |