Re: Hash Join over Nested Loop

From: Luís Roberto Weck <luisroberto(at)siscobra(dot)com(dot)br>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: "pgsql-performance(at)lists(dot)postgresql(dot)org" <pgsql-performance(at)lists(dot)postgresql(dot)org>
Subject: Re: Hash Join over Nested Loop
Date: 2019-11-22 18:48:22
Message-ID: 64ab98b9-703d-5e4e-c428-f0c370fcc7ac@siscobra.com.br
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

****
Em 22/11/2019 14:55, Pavel Stehule escreveu:
>
>
> pá 22. 11. 2019 v 18:37 odesílatel Luís Roberto Weck
> <luisroberto(at)siscobra(dot)com(dot)br <mailto:luisroberto(at)siscobra(dot)com(dot)br>> napsal:
>
>> Hey,
>>
>> I'm trying to figure out why Postgres is choosing a Hash Join
>> over a Nested Loop in this query:
>>
>> SELECT T1.PesID, T1.PesNom, T1.PesValSal, T1.PesCPFCNPJ,
>> T2.CarAti, T1.CarCod, T1.EmpCod,
>>        T2.CarFan, T1.PesDatAge, T1.PesCod, COALESCE(
>> T3.PesDatAnt, DATE '00010101') AS PesDatAnt
>>   FROM ((public.Pessoa T1
>>             INNER JOIN public.Carteira T2 ON T2.EmpCod =
>> T1.EmpCod AND T2.CarCod = T1.CarCod)
>>              LEFT JOIN  (SELECT MIN(COALESCE( T5.ConVenAnt, DATE
>> '00010101')) AS PesDatAnt, T4.EmpCod, T4.CarCod, T4.ConPesCod AS
>> ConPesCod
>>                            FROM (public.Contrato T4
>>                                  LEFT JOIN (SELECT
>> MIN(ConParDatVen) AS ConVenAnt, EmpCod, CarCod, ConPesCod, ConSeq
>> FROM public.ContratoParcela T5
>> WHERE ConParAti = true
>> AND ConParValSal > 0
>> GROUP BY EmpCod, CarCod, ConPesCod, ConSeq ) T5 ON T5.EmpCod    =
>> T4.EmpCod    AND
>> T5.CarCod    = T4.CarCod    AND
>> T5.ConPesCod = T4.ConPesCod AND
>> T5.ConSeq    = T4.ConSeq)
>>                           WHERE T4.ConAti = TRUE
>> GROUP BY T4.EmpCod, T4.CarCod, T4.ConPesCod ) T3 ON t3.EmpCod   
>> = T1.EmpCod AND
>>                           t3.CarCod    = T1.CarCod AND
>>                           t3.ConPesCod = T1.PesCod)
>>  WHERE (T2.CarAti = true)
>>    AND (T1.EmpCod = 112)
>>    and (UPPER(T1.PesNom) like UPPER('%MARIA%'))
>>  ORDER BY T1.EmpCod, T1.CarCod, T1.PesCod
>>
>> Here the Hash Join[1] plan takes ~700ms, and if I change the
>> first LEFT JOIN to a LEFT JOIN LATERAL, forcing a nested loop,
>> the query[2] runs in 3ms.
>>
>> [1] https://explain.depesz.com/s/8IL3
>> [2] https://explain.depesz.com/s/f8Q9
>
>
> Maybe I am wrong, but probably you have to do more than just change
> LEFT JOIN to LATERAL JOIN. Lateral join is based on correlated
> subquery - so you had to push some predicates to subquery - and then
> the query can be much more effective.
>
> Regards
>
> Pavel
>
>
>
>
>
> PostgreSQL version is 11.5, I have run analyze on all the tables.
>
> PG settings:
>
> name                           |setting  |unit|
> -------------------------------|---------|----|
> autovacuum                     |on       |    |
> default_statistics_target      |250      |    |
> effective_cache_size           |983040   |8kB |
> effective_io_concurrency       |200      |    |
> max_parallel_workers           |6        |    |
> max_parallel_workers_per_gather|3        |    |
> random_page_cost               |1.1      |    |
> work_mem                       |51200    |kB  |
>

I'm sorry, I am not sure I understood.

This is the altered query:

SELECT T1.PesID, T1.PesNom, T1.PesValSal, T1.PesCPFCNPJ, T2.CarAti,
T1.CarCod, T1.EmpCod, T2.CarFan, T1.PesDatAge, T1.PesCod,
       COALESCE( T3.PesDatAnt, DATE '00010101') AS PesDatAnt
  FROM ((public.Pessoa T1
            INNER JOIN public.Carteira T2 ON T2.EmpCod = T1.EmpCod AND
T2.CarCod = T1.CarCod)
             LEFT JOIN *LATERAL *(SELECT MIN(COALESCE( T5.ConVenAnt,
DATE '00010101')) AS PesDatAnt, T4.EmpCod, T4.CarCod, T4.ConPesCod AS
ConPesCod
                                  FROM (public.Contrato T4
                                        LEFT JOIN  (SELECT
MIN(ConParDatVen) AS ConVenAnt, EmpCod, CarCod, ConPesCod, ConSeq
                                                     FROM
public.ContratoParcela T5
                                                    WHERE ConParAti = true
                                                      and ConParValSal > 0

                                                    GROUP BY EmpCod,
CarCod, ConPesCod, ConSeq ) T5 ON  T5.EmpCod = T4.EmpCod AND T5.CarCod =
T4.CarCod AND T5.ConPesCod = T4.ConPesCod AND T5.ConSeq = T4.ConSeq)
                                 WHERE T4.ConAti = TRUE
*AND t4.EmpCod = T1.EmpCod AND t4.CarCod = T1.CarCod AND t4.ConPesCod =
T1.PesCod*
                                 GROUP BY T4.EmpCod, T4.CarCod,
T4.ConPesCod ) T3 ON *TRUE ) --ON t3.EmpCod = T1.EmpCod AND t3.CarCod =
T1.CarCod AND t3.ConPesCod = T1.PesCod) *
 WHERE (T2.CarAti = true)
   AND (T1.EmpCod = 112)
   and (UPPER(T1.PesNom) like UPPER('%MARIA%'))
 ORDER BY T1.EmpCod, T1.CarCod, T1.PesCod

In bold are the changes I've made to the query. I am sure PostgreSQL is
able to push it down, since it is much faster now.The problem I have is
that this is a query generated by an ORM, So I can't change it.

I would like to understand why wasn't Postgres able to optimize it to a
nested loop. Is there something I can do with the statistics?

Thanks!!

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Pavel Stehule 2019-11-22 18:54:43 Re: Hash Join over Nested Loop
Previous Message Michael Lewis 2019-11-22 18:13:30 Re: Re[4]: Postgresql planning time too high