From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Srinivas Karthik V <skarthikv(dot)iitb(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Effect of caching hash bucket size while costing |
Date: | 2016-12-10 18:38:27 |
Message-ID: | CA+TgmoZkOZqzyDQ8dKmEAKMf5RDHoN2Rnu73wbauM4p_93Oh=Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Dec 8, 2016 at 3:53 AM, Srinivas Karthik V
<skarthikv(dot)iitb(at)gmail(dot)com> wrote:
> Dear PostgreSQL Hackers,
>
> I am working in PostgreSQL 9.4.* optimizer module. In costsize.c file and
> final_cost_hashjoin() function, the innerbucketsize is either:
>
> a) calculated using a cached copy
> OR
> b) calculated afresh using statistics captured by the following code
> snippet:
> thisbucketsize = estimate_hash_bucketsize(root,
> get_leftop(restrictinfo->clause),virtualbuckets);
>
> For the query I used, if I disable the caching for calculating the
> innerbucketsize, I get a different plan with cost change of around 1000
> units.
>
> 1) Can you please let me know if innerbucketsize*innerpathrows captures the
> maximum bucket size?
> 2) why is it not calculated afresh all the time?
Well, #2 is answered there right in the comments:
* Since we tend to visit the same clauses
over and over when
* planning a large query, we cache the
bucketsize estimate in the
* RestrictInfo node to avoid repeated lookups
of statistics.
I assume the person who wrote the comment thought that the answer
wouldn't change from one call to the next, and therefore it was safe
to cache. I don't know why that isn't the case for you.
As to question #1, there's a comment for that, too, a little further down:
* The number of tuple comparisons needed is the number of outer
* tuples times the typical number of tuples in a hash
bucket, which
* is the inner relation size times its bucketsize
fraction. At each
* one, we need to evaluate the hashjoin quals. But actually,
So innerbucketsize*innerpathrows represents the expected number of
comparisons that we expect to need to perform per hash probe.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Christophe Pettus | 2016-12-10 19:02:19 | Re: [COMMITTERS] pgsql: Implement table partitioning. |
Previous Message | Jeff Janes | 2016-12-10 18:29:05 | Re: postgres_fdw bug in 9.6 |