From: | 高健 <luckyjackgao(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-general <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Why hash join cost calculation need reduction |
Date: | 2013-06-14 05:16:39 |
Message-ID: | CAL454F2Wb5KFmH_CkWpewByC6dFQ2N83M2hdwOCZ=b=gEcskbw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hello:
Thanks for replying!
I understand it a little more.
And I compared the following statements:
First:
postgres=# explain analyze select * from sales s inner join customers c on
s.cust_id = c.cust_id;
QUERY
PLAN
------------------------------------------------------------------------------------------------------------------
Hash Join (cost=1.07..2.15 rows=3 width=84) (actual time=0.017..0.019
rows=3 loops=1)
Hash Cond: (s.cust_id = c.cust_id)
-> Seq Scan on sales s (cost=0.00..1.04 rows=4 width=42) (actual
time=0.004..0.004 rows=4 loops=1)
-> Hash (cost=1.03..1.03 rows=3 width=42) (actual time=0.004..0.004
rows=3 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 1kB
-> Seq Scan on customers c (cost=0.00..1.03 rows=3 width=42)
(actual time=0.001..0.001 rows=3 loops=1)
Total runtime: 0.046 ms
(7 rows)
Second:
postgres=# explain analyze select * from sales s inner join customers c on
s.cust_id = c.cust_id and c.cust_id =2;
QUERY
PLAN
------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..2.10 rows=1 width=84) (actual time=0.000..0.000
rows=1 loops=1)
-> Seq Scan on sales s (cost=0.00..1.05 rows=1 width=42) (actual
time=0.000..0.000 rows=1 loops=1)
Filter: (cust_id = 2)
Rows Removed by Filter: 3
-> Seq Scan on customers c (cost=0.00..1.04 rows=1 width=42) (actual
time=0.000..0.000 rows=1 loops=1)
Filter: (cust_id = 2)
Rows Removed by Filter: 2
Total runtime: 0.000 ms
(8 rows)
Third:
postgres=# explain analyze select * from sales s inner join customers c on
s.cust_id = c.cust_id and c.cust_id <4;
QUERY
PLAN
------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..2.13 rows=1 width=84) (actual time=0.014..0.018
rows=3 loops=1)
Join Filter: (s.cust_id = c.cust_id)
Rows Removed by Join Filter: 9
-> Seq Scan on customers c (cost=0.00..1.04 rows=1 width=42) (actual
time=0.007..0.007 rows=3 loops=1)
Filter: (cust_id < 4)
-> Seq Scan on sales s (cost=0.00..1.04 rows=4 width=42) (actual
time=0.000..0.000 rows=4 loops=3)
Total runtime: 0.038 ms
(7 rows)
postgres=#
The first sql statement and third sql statment really drive the
final_cost_hashjoin function to be called.
I think For the above third one,
cost_qual_eval(&hash_qual_cost, hashclauses, root) is for <s.cust_id =
c.cust_id>
And
cost_qual_eval(&qp_qual_cost, path->jpath.joinrestrictinfo, root) is for
<s.cust_id = c.cust_id and c.cust_id <4>
I've found the following calling relation:
hash_inner_and_outer à try_hashjoin_path à create_hashjoin_path
àfinal_cost_hashjoin
For the second sql statement ,
In the hash_inner_and_outer function,
the < if ( hashclauses) > condition is false, So there is no chance to
try a hashjoin path.
That is :
When I use the where condition such as <cust_id=2>,
postgresql is clever enough to know it is better to make seqscan and
filter ?
2013/6/13 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
> Stephen Frost <sfrost(at)snowman(dot)net> writes:
> > * 高健 (luckyjackgao(at)gmail(dot)com) wrote:
> >> Why the reduction is needed here for cost calculation?
>
> > cost_qual_eval(&hash_qual_cost, hashclauses, root);
> > returns the costs for *just the quals which can be used for the
> > hashjoin*, while
> > cost_qual_eval(&qp_qual_cost, path->jpath.joinrestrictinfo, root);
> > returns the costs for *ALL the quals*
>
> Right. Note what it says in create_hashjoin_path:
>
> * 'restrict_clauses' are the RestrictInfo nodes to apply at the join
> ...
> * 'hashclauses' are the RestrictInfo nodes to use as hash clauses
> * (this should be a subset of the restrict_clauses list)
>
> So the two cost_qual_eval() calls are *both* counting the cost of the
> hashclauses, and we have to undo that to get at just the cost of any
> additional clauses beside the hash clauses. See the comment about the
> usage of qp_qual_cost further down:
>
> /*
> * For each tuple that gets through the hashjoin proper, we charge
> * cpu_tuple_cost plus the cost of evaluating additional restriction
> * clauses that are to be applied at the join. (This is pessimistic
> since
> * not all of the quals may get evaluated at each tuple.)
> */
> startup_cost += qp_qual_cost.startup;
> cpu_per_tuple = cpu_tuple_cost + qp_qual_cost.per_tuple;
> run_cost += cpu_per_tuple * hashjointuples;
>
> regards, tom lane
>
From | Date | Subject | |
---|---|---|---|
Next Message | 高健 | 2013-06-14 08:49:06 | prepared statement functioning range |
Previous Message | Tom Lane | 2013-06-13 19:24:47 | Re: Explicit LOAD and dynamic library loading |