Re: Why hash join cost calculation need reduction

From: 高健 <luckyjackgao(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Why hash join cost calculation need reduction
Date: 2013-06-14 05:16:39
Message-ID: CAL454F2Wb5KFmH_CkWpewByC6dFQ2N83M2hdwOCZ=b=gEcskbw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello:

Thanks for replying!

I understand it a little more.

And I compared the following statements:

First:

postgres=# explain analyze select * from sales s inner join customers c on
s.cust_id = c.cust_id;

QUERY
PLAN

------------------------------------------------------------------------------------------------------------------

Hash Join (cost=1.07..2.15 rows=3 width=84) (actual time=0.017..0.019
rows=3 loops=1)

Hash Cond: (s.cust_id = c.cust_id)

-> Seq Scan on sales s (cost=0.00..1.04 rows=4 width=42) (actual
time=0.004..0.004 rows=4 loops=1)

-> Hash (cost=1.03..1.03 rows=3 width=42) (actual time=0.004..0.004
rows=3 loops=1)

Buckets: 1024 Batches: 1 Memory Usage: 1kB

-> Seq Scan on customers c (cost=0.00..1.03 rows=3 width=42)
(actual time=0.001..0.001 rows=3 loops=1)

Total runtime: 0.046 ms

(7 rows)

Second:

postgres=# explain analyze select * from sales s inner join customers c on
s.cust_id = c.cust_id and c.cust_id =2;

QUERY
PLAN

------------------------------------------------------------------------------------------------------------

Nested Loop (cost=0.00..2.10 rows=1 width=84) (actual time=0.000..0.000
rows=1 loops=1)

-> Seq Scan on sales s (cost=0.00..1.05 rows=1 width=42) (actual
time=0.000..0.000 rows=1 loops=1)

Filter: (cust_id = 2)

Rows Removed by Filter: 3

-> Seq Scan on customers c (cost=0.00..1.04 rows=1 width=42) (actual
time=0.000..0.000 rows=1 loops=1)

Filter: (cust_id = 2)

Rows Removed by Filter: 2

Total runtime: 0.000 ms

(8 rows)

Third:

postgres=# explain analyze select * from sales s inner join customers c on
s.cust_id = c.cust_id and c.cust_id <4;

QUERY
PLAN

------------------------------------------------------------------------------------------------------------

Nested Loop (cost=0.00..2.13 rows=1 width=84) (actual time=0.014..0.018
rows=3 loops=1)

Join Filter: (s.cust_id = c.cust_id)

Rows Removed by Join Filter: 9

-> Seq Scan on customers c (cost=0.00..1.04 rows=1 width=42) (actual
time=0.007..0.007 rows=3 loops=1)

Filter: (cust_id < 4)

-> Seq Scan on sales s (cost=0.00..1.04 rows=4 width=42) (actual
time=0.000..0.000 rows=4 loops=3)

Total runtime: 0.038 ms

(7 rows)

postgres=#

The first sql statement and third sql statment really drive the
final_cost_hashjoin function to be called.

I think For the above third one,

cost_qual_eval(&hash_qual_cost, hashclauses, root) is for <s.cust_id =
c.cust_id>

And

cost_qual_eval(&qp_qual_cost, path->jpath.joinrestrictinfo, root) is for
<s.cust_id = c.cust_id and c.cust_id <4>

I've found the following calling relation:

hash_inner_and_outer à try_hashjoin_path à create_hashjoin_path
àfinal_cost_hashjoin

For the second sql statement ,

In the hash_inner_and_outer function,

the < if ( hashclauses) > condition is false, So there is no chance to
try a hashjoin path.

That is :

When I use the where condition such as <cust_id=2>,

postgresql is clever enough to know it is better to make seqscan and
filter ?

2013/6/13 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>

> Stephen Frost <sfrost(at)snowman(dot)net> writes:
> > * 高健 (luckyjackgao(at)gmail(dot)com) wrote:
> >> Why the reduction is needed here for cost calculation?
>
> > cost_qual_eval(&hash_qual_cost, hashclauses, root);
> > returns the costs for *just the quals which can be used for the
> > hashjoin*, while
> > cost_qual_eval(&qp_qual_cost, path->jpath.joinrestrictinfo, root);
> > returns the costs for *ALL the quals*
>
> Right. Note what it says in create_hashjoin_path:
>
> * 'restrict_clauses' are the RestrictInfo nodes to apply at the join
> ...
> * 'hashclauses' are the RestrictInfo nodes to use as hash clauses
> * (this should be a subset of the restrict_clauses list)
>
> So the two cost_qual_eval() calls are *both* counting the cost of the
> hashclauses, and we have to undo that to get at just the cost of any
> additional clauses beside the hash clauses. See the comment about the
> usage of qp_qual_cost further down:
>
> /*
> * For each tuple that gets through the hashjoin proper, we charge
> * cpu_tuple_cost plus the cost of evaluating additional restriction
> * clauses that are to be applied at the join. (This is pessimistic
> since
> * not all of the quals may get evaluated at each tuple.)
> */
> startup_cost += qp_qual_cost.startup;
> cpu_per_tuple = cpu_tuple_cost + qp_qual_cost.per_tuple;
> run_cost += cpu_per_tuple * hashjointuples;
>
> regards, tom lane
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message 高健 2013-06-14 08:49:06 prepared statement functioning range
Previous Message Tom Lane 2013-06-13 19:24:47 Re: Explicit LOAD and dynamic library loading