From: | Luís Roberto Weck <luisroberto(at)siscobra(dot)com(dot)br> |
---|---|
To: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
Cc: | "pgsql-performance(at)lists(dot)postgresql(dot)org" <pgsql-performance(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Hash Join over Nested Loop |
Date: | 2019-11-22 18:48:22 |
Message-ID: | 64ab98b9-703d-5e4e-c428-f0c370fcc7ac@siscobra.com.br |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
****
Em 22/11/2019 14:55, Pavel Stehule escreveu:
>
>
> pá 22. 11. 2019 v 18:37 odesílatel Luís Roberto Weck
> <luisroberto(at)siscobra(dot)com(dot)br <mailto:luisroberto(at)siscobra(dot)com(dot)br>> napsal:
>
>> Hey,
>>
>> I'm trying to figure out why Postgres is choosing a Hash Join
>> over a Nested Loop in this query:
>>
>> SELECT T1.PesID, T1.PesNom, T1.PesValSal, T1.PesCPFCNPJ,
>> T2.CarAti, T1.CarCod, T1.EmpCod,
>> T2.CarFan, T1.PesDatAge, T1.PesCod, COALESCE(
>> T3.PesDatAnt, DATE '00010101') AS PesDatAnt
>> FROM ((public.Pessoa T1
>> INNER JOIN public.Carteira T2 ON T2.EmpCod =
>> T1.EmpCod AND T2.CarCod = T1.CarCod)
>> LEFT JOIN (SELECT MIN(COALESCE( T5.ConVenAnt, DATE
>> '00010101')) AS PesDatAnt, T4.EmpCod, T4.CarCod, T4.ConPesCod AS
>> ConPesCod
>> FROM (public.Contrato T4
>> LEFT JOIN (SELECT
>> MIN(ConParDatVen) AS ConVenAnt, EmpCod, CarCod, ConPesCod, ConSeq
>> FROM public.ContratoParcela T5
>> WHERE ConParAti = true
>> AND ConParValSal > 0
>> GROUP BY EmpCod, CarCod, ConPesCod, ConSeq ) T5 ON T5.EmpCod =
>> T4.EmpCod AND
>> T5.CarCod = T4.CarCod AND
>> T5.ConPesCod = T4.ConPesCod AND
>> T5.ConSeq = T4.ConSeq)
>> WHERE T4.ConAti = TRUE
>> GROUP BY T4.EmpCod, T4.CarCod, T4.ConPesCod ) T3 ON t3.EmpCod
>> = T1.EmpCod AND
>> t3.CarCod = T1.CarCod AND
>> t3.ConPesCod = T1.PesCod)
>> WHERE (T2.CarAti = true)
>> AND (T1.EmpCod = 112)
>> and (UPPER(T1.PesNom) like UPPER('%MARIA%'))
>> ORDER BY T1.EmpCod, T1.CarCod, T1.PesCod
>>
>> Here the Hash Join[1] plan takes ~700ms, and if I change the
>> first LEFT JOIN to a LEFT JOIN LATERAL, forcing a nested loop,
>> the query[2] runs in 3ms.
>>
>> [1] https://explain.depesz.com/s/8IL3
>> [2] https://explain.depesz.com/s/f8Q9
>
>
> Maybe I am wrong, but probably you have to do more than just change
> LEFT JOIN to LATERAL JOIN. Lateral join is based on correlated
> subquery - so you had to push some predicates to subquery - and then
> the query can be much more effective.
>
> Regards
>
> Pavel
>
>
>
>
>
> PostgreSQL version is 11.5, I have run analyze on all the tables.
>
> PG settings:
>
> name |setting |unit|
> -------------------------------|---------|----|
> autovacuum |on | |
> default_statistics_target |250 | |
> effective_cache_size |983040 |8kB |
> effective_io_concurrency |200 | |
> max_parallel_workers |6 | |
> max_parallel_workers_per_gather|3 | |
> random_page_cost |1.1 | |
> work_mem |51200 |kB |
>
I'm sorry, I am not sure I understood.
This is the altered query:
SELECT T1.PesID, T1.PesNom, T1.PesValSal, T1.PesCPFCNPJ, T2.CarAti,
T1.CarCod, T1.EmpCod, T2.CarFan, T1.PesDatAge, T1.PesCod,
COALESCE( T3.PesDatAnt, DATE '00010101') AS PesDatAnt
FROM ((public.Pessoa T1
INNER JOIN public.Carteira T2 ON T2.EmpCod = T1.EmpCod AND
T2.CarCod = T1.CarCod)
LEFT JOIN *LATERAL *(SELECT MIN(COALESCE( T5.ConVenAnt,
DATE '00010101')) AS PesDatAnt, T4.EmpCod, T4.CarCod, T4.ConPesCod AS
ConPesCod
FROM (public.Contrato T4
LEFT JOIN (SELECT
MIN(ConParDatVen) AS ConVenAnt, EmpCod, CarCod, ConPesCod, ConSeq
FROM
public.ContratoParcela T5
WHERE ConParAti = true
and ConParValSal > 0
GROUP BY EmpCod,
CarCod, ConPesCod, ConSeq ) T5 ON T5.EmpCod = T4.EmpCod AND T5.CarCod =
T4.CarCod AND T5.ConPesCod = T4.ConPesCod AND T5.ConSeq = T4.ConSeq)
WHERE T4.ConAti = TRUE
*AND t4.EmpCod = T1.EmpCod AND t4.CarCod = T1.CarCod AND t4.ConPesCod =
T1.PesCod*
GROUP BY T4.EmpCod, T4.CarCod,
T4.ConPesCod ) T3 ON *TRUE ) --ON t3.EmpCod = T1.EmpCod AND t3.CarCod =
T1.CarCod AND t3.ConPesCod = T1.PesCod) *
WHERE (T2.CarAti = true)
AND (T1.EmpCod = 112)
and (UPPER(T1.PesNom) like UPPER('%MARIA%'))
ORDER BY T1.EmpCod, T1.CarCod, T1.PesCod
In bold are the changes I've made to the query. I am sure PostgreSQL is
able to push it down, since it is much faster now.The problem I have is
that this is a query generated by an ORM, So I can't change it.
I would like to understand why wasn't Postgres able to optimize it to a
nested loop. Is there something I can do with the statistics?
Thanks!!
From | Date | Subject | |
---|---|---|---|
Next Message | Pavel Stehule | 2019-11-22 18:54:43 | Re: Hash Join over Nested Loop |
Previous Message | Michael Lewis | 2019-11-22 18:13:30 | Re: Re[4]: Postgresql planning time too high |