JOIN on partitions is very slow

From: daya airody <daya(dot)airody(at)gmail(dot)com>
To: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: JOIN on partitions is very slow
Date: 2020-03-22 17:22:48
Message-ID: CAFg0VMDrPS+CQF7z0e2SFNtW3wjWV_jVY2mkEWpF-eeg6jha=w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi folks,

We are using postgreSQL database and I am hitting some limits. I have
partitions on company_sale_account table
based on company name

We generate a report on accounts matched between the two. Below is the
query:

SELECT DISTINCT cpsa1.*
FROM company_sale_account cpsa1
JOIN company_sale_account cpsa2 ON cpsa1.sale_account_id =
cpsa2.sale_account_id
WHERE cpsa1.company_name = 'company_a'
AND cpsa2.company_name = 'company_b'

We have setup BTREE indexes on sale_account_id column on both the tables.
This worked fine till recently. Now, we have 10 million rows in
company_a partition and 7 million rows in company_b partition. This query
is taking
more than 10 minutes.

Below is the explain plan output for it:

Buffers: shared hit=20125996 read=47811 dirtied=75, temp read=1333427
written=1333427
I/O Timings: read=19619.322
-> Sort (cost=167950986.43..168904299.23 rows=381325118 width=132)
(actual time=517017.334..603691.048 rows=16854094 loops=1)
Sort Key: cpsa1.crm_account_id, ((cpsa1.account_name)::text),
((cpsa1.account_owner)::text), ((cpsa1.account_type)::text),
cpsa1.is_customer, ((date_part('epoch'::text,
cpsa1.created_date))::integer),
((hstore_to_json(cpsa1.custom_crm_fields))::tex (...)
Sort Method: external merge Disk: 2862656kB
Buffers: shared hit=20125996 read=47811 dirtied=75, temp
read=1333427 written=1333427
I/O Timings: read=19619.322
-> Nested Loop (cost=0.00..9331268.39 rows=381325118 width=132)
(actual time=1.680..118698.570 rows=16854094 loops=1)
Buffers: shared hit=20125977 read=47811 dirtied=75
I/O Timings: read=19619.322
-> Append (cost=0.00..100718.94 rows=2033676 width=33)
(actual time=0.014..1783.243 rows=2033675 loops=1)
Buffers: shared hit=75298 dirtied=75
-> Seq Scan on company_sale_account cpsa2
(cost=0.00..0.00 rows=1 width=516) (actual time=0.001..0.001 rows=0
loops=1)
Filter: ((company_name)::text = 'company_b'::text)
-> Seq Scan on company_sale_account_concur cpsa2_1
(cost=0.00..100718.94 rows=2033675 width=33) (actual time=0.013..938.145
rows=2033675 loops=1)
Filter: ((company_name)::text = 'company_b'::text)
Buffers: shared hit=75298 dirtied=75
-> Append (cost=0.00..1.97 rows=23 width=355) (actual
time=0.034..0.047 rows=8 loops=2033675)
Buffers: shared hit=20050679 read=47811
I/O Timings: read=19619.322
-> Seq Scan on company_sale_account cpsa1
(cost=0.00..0.00 rows=1 width=4525) (actual time=0.000..0.000 rows=0
loops=2033675)
Filter: (((company_name)::text =
'company_a'::text) AND ((cpsa2.sale_account_id)::text =
(sale_account_id)::text))
-> Index Scan using ix_csa_adp_sale_account on
company_sale_account_adp cpsa1_1 (cost=0.56..1.97 rows=22 width=165)
(actual time=0.033..0.042 rows=8 loops=2033675)
Index Cond: ((sale_account_id)::text =
(cpsa2.sale_account_id)::text)
Filter: ((company_name)::text = 'company_a'::text)
Buffers: shared hit=20050679 read=47811
I/O Timings: read=19619.322
Planning time: 30.853 ms
Execution time: 618218.321 ms

Do you have any suggestion on how to tune postgres.
Please share your thoughts. It would be a great help to me.

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Michael Lewis 2020-03-22 18:08:37 Re: JOIN on partitions is very slow
Previous Message Tom Lane 2020-03-21 22:55:44 Re: Slow planning time when public schema included (12 vs. 9.4)