From: | David Osborne <david(at)qcode(dot)co(dot)uk> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | Slow 3 Table Join with v bad row estimate |
Date: | 2015-11-10 12:13:21 |
Message-ID: | CAKmpXCciVMDYTtRRF1ADHJU=3MN52QKGk--78W4NnHw3wrzFLQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
We're hoping to get some suggestions as to improving the performance of a 3
table join we're carrying out.
(I've stripped out some schema info to try to keep this post from getting
too convoluted - if something doesn't make sense it maybe I've erroneously
taken out something significant)
The 3 tables and indices are:
\d branch_purchase_order
Table
"public.branch_purchase_order"
Column | Type |
Modifiers
-------------------+--------------------------------+-----------------------------------------------------------------------
po_id | integer | not null default
nextval('branch_purchase_order_po_id_seq'::regclass)
branch_code | character(2) |
po_number | character varying(20) |
supplier | character varying(50) |
order_date | timestamp(0) without time zone |
po_state | character varying(10) |
Indexes:
"branch_purchase_order_pkey" PRIMARY KEY, btree (po_id)
"branch_po_unique_order_no_idx" UNIQUE, btree (branch_code, po_number)
"branch_po_no_idx" btree (po_number)
"branch_po_state_idx" btree (po_state)
Referenced by:
TABLE "branch_purchase_order_products" CONSTRAINT
"branch_purchase_order_products_po_id_fkey" FOREIGN KEY (po_id) REFERENCES
branch_purchase_order(po_id) ON DELETE CASCADE
\d branch_purchase_order_products
Table "public.branch_purchase_order_products"
Column | Type | Modifiers
--------------------+--------------------------------+-----------
po_id | integer |
product_code | character varying(20) |
date_received | date |
Indexes:
"branch_purchase_order_product_code_idx" btree (product_code)
"branch_purchase_order_product_po_idx" btree (po_id)
"branch_purchase_order_products_date_received_idx" btree (date_received)
Foreign-key constraints:
"branch_purchase_order_products_po_id_fkey" FOREIGN KEY (po_id)
REFERENCES branch_purchase_order(po_id) ON DELETE CASCADE
\d stocksales_ib
Table "public.stocksales_ib"
Column | Type | Modifiers
--------------+--------------------------------+-----------
row_id | integer |
branch_code | character(2) |
product_code | character varying(20) |
invoice_date | timestamp(0) without time zone |
qty | integer |
order_no | character varying(30) |
Indexes:
"ssales_ib_branch_idx" btree (branch_code)
"ssales_ib_invoice_date_date_idx" btree ((invoice_date::date))
"ssales_ib_invoice_date_idx" btree (invoice_date)
"ssales_ib_order_no" btree (order_no)
"ssales_ib_product_idx" btree (product_code)
"ssales_ib_replace_order_no" btree (replace(order_no::text, ' '::text,
''::text))
"ssales_ib_row_idx" btree (row_id)
"stocksales_ib_branch_code_row_id_idx" btree (branch_code, row_id)
"stocksales_ib_substring_idx" btree
("substring"(replace(order_no::text, ' '::text, ''::text), 3, 2))
The join we're using is:
branch_purchase_order o
join branch_purchase_order_products p using(po_id)
join stocksales_ib ss on o.supplier=ss.branch_code
and p.product_code=ss.product_code
and X
We have 3 different ways we have to do the final X join condition (we use 3
subqueries UNIONed together), but the one causing the issues is:
(o.branch_code || o.po_number = replace(ss.order_no,' ',''))
which joins branch_purchase_order to stocksales_ib under the following
circumstances:
ss.order_no | o.branch_code | o.po_number
----------------+---------------+-----------
AA IN105394 | AA | IN105394
BB IN105311 | BB | IN105311
CC IN105311 | CC | IN105311
DD IN105310 | DD | IN105310
EE IN105310 | EE | IN105310
The entire query (leaving aside the UNION'ed subqueries for readability)
looks like this:
select
po_id,
product_code,
sum(qty) as dispatch_qty,
max(invoice_date) as dispatch_date,
count(invoice_date) as dispatch_count
from (
select
o.po_id,
p.product_code,
ss.qty,
ss.invoice_date
from
branch_purchase_order o
join branch_purchase_order_products p using(po_id)
join stocksales_ib ss on o.supplier=ss.branch_code
and p.product_code=ss.product_code
and (o.branch_code || o.po_number=replace(ss.order_no,' ',''))
where
o.po_state='PLACED'
and o.supplier='XX'
) x
group by po_id,product_code
Explain output:
http://explain.depesz.com/s/TzF8h
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=83263.72..83263.73 rows=1 width=24) (actual
time=23908.777..23927.461 rows=52500 loops=1)
Buffers: shared hit=23217993 dirtied=1
-> Nested Loop (cost=1.29..83263.71 rows=1 width=24) (actual
time=0.196..23799.930 rows=53595 loops=1)
Join Filter: (o.po_id = p.po_id)
Rows Removed by Join Filter: 23006061
Buffers: shared hit=23217993 dirtied=1
-> Nested Loop (cost=0.86..57234.41 rows=3034 width=23) (actual
time=0.162..129.508 rows=54259 loops=1)
Buffers: shared hit=18520
-> Index Scan using branch_po_state_idx on
branch_purchase_order o (cost=0.42..807.12 rows=1672 width=17) (actual
time=0.037..4.863 rows=1916 loops=1)
Index Cond: ((po_state)::text = 'PLACED'::text)
Filter: ((supplier)::text = 'XX'::text)
Rows Removed by Filter: 3050
Buffers: shared hit=2157
-> Index Scan using ssales_ib_replace_order_no on
stocksales_ib ss (cost=0.44..33.74 rows=1 width=31) (actual
time=0.014..0.044 rows=28 loops=1916)
Index Cond: (replace((order_no)::text, ' '::text,
''::text) = ((o.branch_code)::text || (o.po_number)::text))
Filter: ((o.supplier)::bpchar = branch_code)
Rows Removed by Filter: 0
Buffers: shared hit=16363
-> Index Scan using branch_purchase_order_product_code_idx on
branch_purchase_order_products p (cost=0.43..5.45 rows=250 width=12)
(actual time=0.018..0.335 rows=425 loops=54259)
Index Cond: ((product_code)::text = (ss.product_code)::text)
Buffers: shared hit=23199473 dirtied=1
Total runtime: 23935.995 ms
(22 rows)
So we can see straight away that the outer Nested loop expects 1 row, and
gets 53595. This isn't going to help the planner pick the most efficient
plan I suspect.
I've tried increasing default_statistics_target to the max and re analysing
all the tables involved but this does not help the estimate.
I suspect it's due to the join being based on functional result meaning any
stats are ignored?
What has improved runtimes is using a WITH clause to carry out the first
join explicitly. But although it runs in half the time, the stats are still
way out and I feel it is maybe just because I'm limiting the planner's
choices that it by chance picks a different, quicker, plan.
It does a Hash Join and Seq Scan
with bpo as (
select
o.branch_code || o.po_number as order_no,
o.po_id,
o.supplier,
o.branch_code,
p.product_code
from branch_purchase_order o
join branch_purchase_order_products p using(po_id)
where
o.po_state='PLACED'
and o.supplier='XX'
)
select
po_id,
product_code,
sum(qty) as dispatch_qty,
max(invoice_date) as dispatch_date,
count(invoice_date) as dispatch_count
from (
select
o.po_id,
o.product_code,
ss.qty,
ss.invoice_date
from
bpo o
join stocksales_ib ss on o.supplier=ss.branch_code
and o.product_code=ss.product_code
and o.order_no=replace(ss.order_no,' ','')
) x
group by po_id,product_code
Explain:
http://explain.depesz.com/s/r7v
Can anyone suggest a better approach for improving the plan for this type
of query?
select version();
version
---------------------------------------------------------------------------------------------------------------
PostgreSQL 9.3.10 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.6.3
20120306 (Red Hat 4.6.3-2), 64-bit
Regards,
--
David
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2015-11-10 15:03:29 | Re: Slow 3 Table Join with v bad row estimate |
Previous Message | Joe Van Dyk | 2015-11-09 17:13:48 | Re: querying jsonb for arrays inside a hash |