Re: How does PostgerSQL planner decide driving table

From: Emanuel Calvo <emanuel(dot)calvo(at)2ndquadrant(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: How does PostgerSQL planner decide driving table
Date: 2014-10-06 16:24:24
Message-ID: 5432C238.8070906@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


El 06/10/14 02:38, Shingo horiuchi escribió:
>
> Dear All,
>
>
>
> I have read about query tuning and attempt to check the impact of
>
> exchanging the driving table in Join condition.
>
>
>
> To test Simple Join condition, I prepared Two tables.
>
> One is the table for employees and another is the table for departments.
>
> Employees table has the foreign key which referencing departments table.
>
>
>
> The proportion of each table, filtered table and joined table is below:
>
> (E means employees table and D means departments table.)
>
> #rows in E #rows in D #filtered rows in
> E #filtered rows in D #rows in E
> and D
>
> 10000 490
> 1000
> 245 9800
>
>
>
> After make the index on the filtered column, I tried the query:
>
> EXPLAIN ANALYZE
>
> SELECT D.Department_Name, E.Last_Name, E.First_Name
>
> FROM Employees E, Departments D
>
> WHERE E.Department_Id=D.Department_Id
>
> AND E.Exempt_Flag='Y'
>
> AND D.US_Based_Flag='Y'
>
> ;
>
>
>
> Result was:
>
> Hash Join (cost=8.85..241.59 rows=499 width=15) (actual
> time=0.105..2.052 rows=518 loops=1)
>
> Hash Cond: (e.department_id = d.department_id)
>
> -> Seq Scan on employees e (cost=0.00..209.00 rows=5000 width=17)
> (actual time=0.007..1.541 rows=5000 loops=1)
>
> Filter: (exempt_flag = 'Y'::bpchar)
>
> Rows Removed by Filter: 5000
>
> -> Hash (cost=8.24..8.24 rows=49 width=14) (actual
> time=0.087..0.087 rows=49 loops=1)
>
> Buckets: 1024 Batches: 1 Memory Usage: 3kB
>
> -> Bitmap Heap Scan on departments d (cost=4.63..8.24
> rows=49 width=14) (actual time=0.069..0.078 rows=49 loops=1)
>
> Recheck Cond: (us_based_flag = 'Y'::bpchar)
>
> -> Bitmap Index Scan on dept2_flg_idx
> (cost=0.00..4.62 rows=49 width=0) (actual time=0.063..0.063 rows=49
> loops=1)
>
> Index Cond: (us_based_flag = 'Y'::bpchar)
>
> Total runtime: 2.095 ms
>

I would recommend to run an ANALYZE on both tables.

You can't exchange the seqscan table on that query due that it needs to
read ALL the
records on E table. If you seqscan D first, the engine needs to read
again all the E
records.

If you want all the employees across all departments, you'll always end
up reading all
the employees.

What you can do to other plans, is disabling enable_hashjoin.

> */ /*
>
> In order to exchange the driving table, I tried the query:
>
> EXPLAIN ANALYZE
>
> SELECT D.Department_Name, E.Last_Name, E.First_Name
>
> FROM Departments D, Employees E
>
> WHERE D.Department_Id=E.Department_Id
>
> AND E.Exempt_Flag='Y'
>
> AND D.US_Based_Flag='Y'
>
> ;
>
>
>
> However, the result was same.
>
> I think this is because the query planner can optimizer the 2^nd
> query based on table statistics of E and D.
>
> E being the larger number of records and has higher filtering rate so
> it continues to be driving table.
>
>
>
> Now, I tried another test case to confirm my assumption.
>
> The proportion of another test case is different from above one.
>
> The proportion of each table, filtered table and joined table is below:
>
> (E means employees table and D means departments table.)
>
> #rows in E #rows in D #filtered rows in
> E #filtered rows in D #rows in E
> and D
>
> 10000
> 490 5000
> 49 9800
>
> The important point is the difference in the filtering rate.
>
> In this case, departments table is higher filtering rate,
>
> so taking departments table as driving table will be able to cut the
> computational cost, I think.
>
>
>
> I tried same query:
>
> EXPLAIN ANALYZE
>
> SELECT D.Department_Name, E.Last_Name, E.First_Name
>
> FROM Employees E, Departments D
>
> WHERE E.Department_Id=D.Department_Id
>
> AND E.Exempt_Flag='Y'
>
> AND D.US_Based_Flag='Y'
>
> And
>
> EXPLAIN ANALYZE
>
> SELECT D.Department_Name, E.Last_Name, E.First_Name
>
> FROM Employees E, Departments D
>
> WHERE E.Department_Id=D.Department_Id
>
> AND E.Exempt_Flag='Y'
>
> AND D.US_Based_Flag='Y'
>
>
>

Both queries are identical (?).

--
--
Emanuel Calvo http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Igor Neyman 2014-10-06 16:54:50 Re: How to get good performance for very large lists/sets?
Previous Message Melvin Davidson 2014-10-06 15:25:28 Re: Converting char to varchar automatically