Re: Hash Join over Nested Loop

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Luís Roberto Weck <luisroberto(at)siscobra(dot)com(dot)br>
Cc: "pgsql-performance(at)lists(dot)postgresql(dot)org" <pgsql-performance(at)lists(dot)postgresql(dot)org>
Subject: Re: Hash Join over Nested Loop
Date: 2019-11-22 18:54:43
Message-ID: CAFj8pRDb5LP1CpWTFMjHCqjgxs8H=-R2qMOa0Hh1q48OVMTvwg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

pá 22. 11. 2019 v 19:42 odesílatel Luís Roberto Weck <
luisroberto(at)siscobra(dot)com(dot)br> napsal:

> Em 22/11/2019 14:55, Pavel Stehule escreveu:
>
>
>
> pá 22. 11. 2019 v 18:37 odesílatel Luís Roberto Weck <
> luisroberto(at)siscobra(dot)com(dot)br> napsal:
>
>> Hey,
>>
>> I'm trying to figure out why Postgres is choosing a Hash Join over a
>> Nested Loop in this query:
>>
>> SELECT T1.PesID, T1.PesNom, T1.PesValSal, T1.PesCPFCNPJ, T2.CarAti,
>> T1.CarCod, T1.EmpCod,
>> T2.CarFan, T1.PesDatAge, T1.PesCod, COALESCE( T3.PesDatAnt, DATE
>> '00010101') AS PesDatAnt
>> FROM ((public.Pessoa T1
>> INNER JOIN public.Carteira T2 ON T2.EmpCod = T1.EmpCod AND
>> T2.CarCod = T1.CarCod)
>> LEFT JOIN (SELECT MIN(COALESCE( T5.ConVenAnt, DATE
>> '00010101')) AS PesDatAnt, T4.EmpCod, T4.CarCod, T4.ConPesCod AS ConPesCod
>> FROM (public.Contrato T4
>> LEFT JOIN (SELECT MIN(ConParDatVen) AS
>> ConVenAnt, EmpCod, CarCod, ConPesCod, ConSeq
>> FROM
>> public.ContratoParcela T5
>> WHERE ConParAti = true
>> AND ConParValSal > 0
>> GROUP BY EmpCod, CarCod,
>> ConPesCod, ConSeq ) T5 ON T5.EmpCod = T4.EmpCod AND
>>
>> T5.CarCod = T4.CarCod AND
>>
>> T5.ConPesCod = T4.ConPesCod AND
>>
>> T5.ConSeq = T4.ConSeq)
>> WHERE T4.ConAti = TRUE
>> GROUP BY T4.EmpCod, T4.CarCod, T4.ConPesCod )
>> T3 ON t3.EmpCod = T1.EmpCod AND
>>
>> t3.CarCod = T1.CarCod AND
>>
>> t3.ConPesCod = T1.PesCod)
>> WHERE (T2.CarAti = true)
>> AND (T1.EmpCod = 112)
>> and (UPPER(T1.PesNom) like UPPER('%MARIA%'))
>> ORDER BY T1.EmpCod, T1.CarCod, T1.PesCod
>>
>> Here the Hash Join[1] plan takes ~700ms, and if I change the first LEFT
>> JOIN to a LEFT JOIN LATERAL, forcing a nested loop, the query[2] runs in
>> 3ms.
>>
>> [1] https://explain.depesz.com/s/8IL3
>> [2] https://explain.depesz.com/s/f8Q9
>>
>>
> Maybe I am wrong, but probably you have to do more than just change LEFT
> JOIN to LATERAL JOIN. Lateral join is based on correlated subquery - so you
> had to push some predicates to subquery - and then the query can be much
> more effective.
>
> Regards
>
> Pavel
>
>
>
>
>
>> PostgreSQL version is 11.5, I have run analyze on all the tables.
>>
>> PG settings:
>>
>> name |setting |unit|
>> -------------------------------|---------|----|
>> autovacuum |on | |
>> default_statistics_target |250 | |
>> effective_cache_size |983040 |8kB |
>> effective_io_concurrency |200 | |
>> max_parallel_workers |6 | |
>> max_parallel_workers_per_gather|3 | |
>> random_page_cost |1.1 | |
>> work_mem |51200 |kB |
>>
>
> I'm sorry, I am not sure I understood.
>
> This is the altered query:
>
> SELECT T1.PesID, T1.PesNom, T1.PesValSal, T1.PesCPFCNPJ, T2.CarAti,
> T1.CarCod, T1.EmpCod, T2.CarFan, T1.PesDatAge, T1.PesCod,
> COALESCE( T3.PesDatAnt, DATE '00010101') AS PesDatAnt
> FROM ((public.Pessoa T1
> INNER JOIN public.Carteira T2 ON T2.EmpCod = T1.EmpCod AND
> T2.CarCod = T1.CarCod)
> LEFT JOIN *LATERAL *(SELECT MIN(COALESCE( T5.ConVenAnt, DATE
> '00010101')) AS PesDatAnt, T4.EmpCod, T4.CarCod, T4.ConPesCod AS ConPesCod
> FROM (public.Contrato T4
> LEFT JOIN (SELECT
> MIN(ConParDatVen) AS ConVenAnt, EmpCod, CarCod, ConPesCod, ConSeq
> FROM
> public.ContratoParcela T5
> WHERE ConParAti = true
> and ConParValSal > 0
>
> GROUP BY EmpCod,
> CarCod, ConPesCod, ConSeq ) T5 ON T5.EmpCod = T4.EmpCod AND T5.CarCod =
> T4.CarCod AND T5.ConPesCod = T4.ConPesCod AND T5.ConSeq = T4.ConSeq)
> WHERE T4.ConAti = TRUE
> *AND t4.EmpCod = T1.EmpCod AND
> t4.CarCod = T1.CarCod AND t4.ConPesCod = T1.PesCod*
> GROUP BY T4.EmpCod, T4.CarCod,
> T4.ConPesCod ) T3 ON *TRUE ) --ON t3.EmpCod = T1.EmpCod AND t3.CarCod =
> T1.CarCod AND t3.ConPesCod = T1.PesCod) *
> WHERE (T2.CarAti = true)
> AND (T1.EmpCod = 112)
> and (UPPER(T1.PesNom) like UPPER('%MARIA%'))
> ORDER BY T1.EmpCod, T1.CarCod, T1.PesCod
>
> In bold are the changes I've made to the query. I am sure PostgreSQL is
> able to push it down, since it is much faster now. The problem I have is
> that this is a query generated by an ORM, So I can't change it.
>
> I would like to understand why wasn't Postgres able to optimize it to a
> nested loop. Is there something I can do with the statistics?
>

I don't think. Postgres optimizer just doesn't support this optimization.
It has sense only when you know so number of loops is very small - else
nested loop should be much slower.

Regards

Pavel

> Thanks!!
>
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Tomas Vondra 2019-11-22 19:36:02 Re: Postgresql planning time too high
Previous Message Luís Roberto Weck 2019-11-22 18:48:22 Re: Hash Join over Nested Loop