Re: How to hint two columns IS NOT DISTINCT FROM each other in a join

From: Kim Rose Carlsen <krc(at)hiper(dot)dk>
To: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: How to hint two columns IS NOT DISTINCT FROM each other in a join
Date: 2016-10-28 16:35:00
Message-ID: AM4PR0501MB26105A99BC9DF98CEF5967D0C7AD0@AM4PR0501MB2610.eurprd05.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Sorry for double post, just ignore this post..

________________________________
From: pgsql-general-owner(at)postgresql(dot)org <pgsql-general-owner(at)postgresql(dot)org> on behalf of Kim Rose Carlsen <krc(at)hiper(dot)dk>
Sent: Thursday, October 27, 2016 6:34:58 PM
To: pgsql-general(at)postgresql(dot)org
Subject: [GENERAL] How to hint two columns IS NOT DISTINCT FROM each other in a join

This sender failed our fraud detection checks and may not be who they appear to be. Learn about spoofing<http://aka.ms/LearnAboutSpoofing> Feedback<http://aka.ms/SafetyTipsFeedback>

Hi

I was wondering if there is a way to hint that two columns in two different tables IS NOT DISTINCT FROM each other. So that the optimizer may assume if table_a.key = 'test' THEN table_b.key = 'test' .

The equals operator already does this but it does not handle NULLS very well (or not at all). And it seems IS NOT DISTINCT FROM is not indexable and doesn't establish the same inference rules as equals.

Example:

CREATE TABLE a (

id INTEGER PRIMARY KEY,

key VARCHAR,

value VARCHAR

);

CREATE INDEX ON a (key);

INSERT INTO a

VALUES (1, 'test', 'test'), (2, 'foo', 'bar'), (3, null, null), (4, 'baz', 'qoz');

CREATE VIEW view_a AS (

SELECT table_a.id,

table_a.key,

table_a.value,

table_b.key as b_key

FROM a AS table_a

JOIN a AS table_b

ON table_a.id = table_b.id

);

CREATE VIEW view_a_eq AS (

SELECT table_a.id,

table_a.key,

table_a.value,

table_b.key as b_key

FROM a AS table_a

JOIN a AS table_b

ON table_a.id = table_b.id

AND table_a.key = table_b.key

);

CREATE VIEW view_a_distinct AS (

SELECT table_a.id,

table_a.key,

table_a.value,

table_b.key as b_key

FROM a AS table_a

JOIN a AS table_b

ON table_a.id = table_b.id

AND table_a.key IS NOT DISTINCT FROM table_b.key

);

EXPLAIN SELECT * FROM view_a WHERE key = 'test';

QUERY PLAN
------------------------------------------------------------------------------------
Hash Join (cost=12.69..34.42 rows=4 width=100)
Hash Cond: (table_b.id = table_a.id)
-> Seq Scan on a table_b (cost=0.00..18.50 rows=850 width=36)
-> Hash (cost=12.64..12.64 rows=4 width=68)
-> Bitmap Heap Scan on a table_a (cost=4.18..12.64 rows=4 width=68)
Recheck Cond: ((key)::text = 'test'::text)
-> Bitmap Index Scan on a_key_idx (cost=0.00..4.18 rows=4 width=0)
Index Cond: ((key)::text = 'test'::text)

We only get index scan on table_a

EXPLAIN SELECT * FROM view_a_eq WHERE key = 'test';

QUERY PLAN
------------------------------------------------------------------------------------
Nested Loop (cost=8.36..25.53 rows=1 width=100)
Join Filter: (table_a.id = table_b.id)
-> Bitmap Heap Scan on a table_a (cost=4.18..12.64 rows=4 width=68)
Recheck Cond: ((key)::text = 'test'::text)
-> Bitmap Index Scan on a_key_idx (cost=0.00..4.18 rows=4 width=0)
Index Cond: ((key)::text = 'test'::text)
-> Materialize (cost=4.18..12.66 rows=4 width=36)
-> Bitmap Heap Scan on a table_b (cost=4.18..12.64 rows=4 width=36)
Recheck Cond: ((key)::text = 'test'::text)
-> Bitmap Index Scan on a_key_idx (cost=0.00..4.18 rows=4 width=0)
Index Cond: ((key)::text = 'test'::text)
We get index scan on both tables and the where clause is pushed all the way down

EXPLAIN SELECT * FROM view_a_distinct WHERE key = 'test';

QUERY PLAN
------------------------------------------------------------------------------------
Hash Join (cost=12.69..34.43 rows=1 width=100)
Hash Cond: (table_b.id = table_a.id)
Join Filter: (NOT ((table_a.key)::text IS DISTINCT FROM (table_b.key)::text))
-> Seq Scan on a table_b (cost=0.00..18.50 rows=850 width=36)
-> Hash (cost=12.64..12.64 rows=4 width=68)
-> Bitmap Heap Scan on a table_a (cost=4.18..12.64 rows=4 width=68)
Recheck Cond: ((key)::text = 'test'::text)
-> Bitmap Index Scan on a_key_idx (cost=0.00..4.18 rows=4 width=0)
Index Cond: ((key)::text = 'test'::text)
Same as the first example

In these examples it really doesn't matter which plan is used. But for larger view it might be a great hint for the optimizer to know that since we are joining on a primary key we could hint that all other columns for the table is actually NOT DISTINCT from each other. This will result in the planner being able to push the WHERE condition down into the other joined tables.

It works well for the = operator, but it is easy to see the it fails as soon as we asks about nulls.

SELECT * FROM view_a_eq WHERE key IS NULL;
id | key | value | b_key
----+-----+-------+-------
(0 rows)

Maybe my examples are too simple, but I hope you can verify the same will hold for a much larger table with 2 different views on top of them.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Jehan-Guillaume de Rorthais 2016-10-28 16:49:59 Re: Fast Primary shutdown only after wal_sender_timeout
Previous Message Kim Rose Carlsen 2016-10-28 16:29:26 How to hint 2 coulms IS NOT DISTINCT FROM each other