RE: NL Join vs Merge Join - 5 hours vs 2 seconds

From: "Kumar, Virendra" <Virendra(dot)Kumar(at)guycarp(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: RE: NL Join vs Merge Join - 5 hours vs 2 seconds
Date: 2018-12-17 20:20:12
Message-ID: 2cd3f208288045c485794337941f0e44@USFKL11XG20CN01.mercer.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thank you Tom for initial thoughts!

I tried query with function dependency statistics but it did not help. I don't see we can create such statistics on table join.

Regards,
Virendra

-----Original Message-----
From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
Sent: Monday, December 17, 2018 12:02 PM
To: Kumar, Virendra
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: NL Join vs Merge Join - 5 hours vs 2 seconds

"Kumar, Virendra" <Virendra(dot)Kumar(at)guycarp(dot)com> writes:
> We are using Postgers-XL based on PostgreSQL 10.5 on RHEL7. We have an extreme case of one query which running in 4 hour 45 mins and 33 seconds with Nested Loops Join vs 2.5 seconds when we disable enable_nestloop. The query is given below. I would be happy to share more information if required.

It looks like what's burning you is bad estimates at these joins:

> -> Nested Loop (cost=33681.00..71431.33 rows=1 width=348) (actual time=359.006..926.303 rows=37207 loops=3)
> Join Filter: (se.account_id = a.account_id)
> -> Hash Join (cost=33680.71..71298.94 rows=395 width=300) (actual time=358.953..656.923 rows=36828 loops=3)
> Hash Cond: ((se.account_id = sh.account_id) AND (se.site_id = sh.site_id))
...
> -> Index Scan using account_p1000013_account_id_idx on account_p1000013 a (cost=0.29..0.32 rows=1 width=52) (actual time=0.003..0.005 rows=1 loops=110485)
> Index Cond: (account_id = sh.account_id)
> Filter: (portfolio_id = 1000013)

I'm guessing that account_id, site_id, and portfolio_id are all highly
correlated, but the planner doesn't know that and thinks the additional
conditions will remove way more rows than they actually do.

In PG10 and later, you can probably fix that by creating
functional-dependency statistics on those pairs of columns.

regards, tom lane

________________________________

This message is intended only for the use of the addressee and may contain
information that is PRIVILEGED AND CONFIDENTIAL.

If you are not the intended recipient, you are hereby notified that any
dissemination of this communication is strictly prohibited. If you have
received this communication in error, please erase all copies of the message
and its attachments and notify the sender immediately. Thank you.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Ron 2018-12-17 20:32:25 Re: Alter table column constraint
Previous Message Rich Shepard 2018-12-17 20:20:09 Re: Alter table column constraint