From: | Gustavo Rezende Montesino <gustavo(dot)montesino(at)trtsp(dot)jus(dot)br> |
---|---|
To: | Clailson <clailson(dot)dba(at)gmail(dot)com>, phillip(at)couto(dot)in |
Cc: | "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: Optimization inner join |
Date: | 2017-01-19 13:23:54 |
Message-ID: | 5880BDEA.9050804@trtsp.jus.br |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hello,
Em 19/01/2017 11:04, Clailson escreveu:
> Hi Phillip.
>
>
>>
>> Not sure if it is all that common. Curious what if you put b.b IS NOT
>> NULL in the WHERE statement?
>
> It's the question. In the company I work with, one of my clients asked
> me: "Why PostgreSQL does not remove rows with null in column b (table
> b), before joining, since these rows have no corresponding in table
> a?" I gave the suggestion to put the IS NOT NULL in the WHERE
> statement, but HE can't modify the query in the application.
>
> I did the tests with Oracle and it uses a predicate in the query plan,
> removing the lines where b.b is null. In Oracle, it´s the same plan,
> with and without IS NOT NULL in the WHERE statement.
Being the client in question, I would like to make a little remark: What
we thought could be optimized here at first is on the row estimate of
the index scan; which could take null_frac into account. To put things
into perspective, our similar case in production has a table with 6
million lines where only 9.5k aren´t null for the join field, an the
over-estimation is throwing away good plans (like ~150ms execution time)
in favor of pretty bad ones (~80s execution time).
We´ve asked application people to put the where not null workaround,
which works great, and are waiting on an answer, but I believe getting
better estimates without that would be great if possible.
>
> On 19/01/2017 09:34, Phillip Couto wrote:
>> NULL is still a value that may be paired with a NULL in a.a
Is that so? I would believe you would never get a match, as NULL <> NULL
>>> On Jan 19, 2017, at 05:08, Clailson <clailson(dot)dba(at)gmail(dot)com
>>> <mailto:clailson(dot)dba(at)gmail(dot)com>> wrote:
>>>
>>> Hi,
>>>
>>> Is there something in the roadmap to optimize the inner join?
>>>
>>> I've this situation above. Table b has 400 rows with null in the
>>> column b.
>>>
>>> explain analyze select * from a inner join b on (b.b = a.a);
>>> "Merge Join (cost=0.55..65.30 rows=599 width=16) (actual time=0.030..1.173 rows=599 loops=1)"
>>> " Merge Cond: (a.a = b.b)"
>>> " -> Index Scan using a_pkey on a (cost=0.28..35.27 rows=1000 width=8) (actual time=0.014..0.364 rows=1000 loops=1)"
>>> " -> Index Scan using in01 on b (cost=0.28..33.27 rows=1000 width=8) (actual time=0.012..0.249 rows=600 loops=1)"
>>> "Total runtime: 1.248 ms"
>>>
>>> My question is: Why the planner isn't removing the null rows during
>>> the scan of table b?
>>> --
>>> Clailson Soares Dinízio de Almeida
>>
>
Regards,
Gustavo R. Montesino
Tribunal Regional do Trabalho da 2a Região
Secretaria de Tecnologia da Informação e Comunicação
Coordenadoria de Infraestrutura de TIC
Seção de Administração de Banco de Dados
Av. Marquês de São Vicente, 121 - Bl. A - Sala 404
Telefone: (11) 3150-2082
From | Date | Subject | |
---|---|---|---|
Next Message | Vitalii Tymchyshyn | 2017-01-19 13:30:33 | Re: Optimization inner join |
Previous Message | Phillip Couto | 2017-01-19 13:18:19 | Re: Optimization inner join |