Optimising outer joins in the presence of non-nullable references

From: Philip Lykke Carlsen <philip(at)hasura(dot)io>
To: pgsql-performance(at)postgresql(dot)org
Subject: Optimising outer joins in the presence of non-nullable references
Date: 2021-05-24 12:10:24
Message-ID: CAN-AqA4MTY34V9nt66YrHE75ZaCpFanqcoTyL3ESbEQMX_Gz-Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi list.

I have a question about the different plans produced by postgres for
an outer join versus an inner join.

(complete sql script attached)

Take these two tables:

CREATE TABLE album
( id SERIAL PRIMARY KEY,
title TEXT NOT NULL
);
CREATE INDEX album_title ON album (title);

CREATE TABLE track
( id SERIAL PRIMARY KEY,
album_id INT NOT NULL REFERENCES album(id),
title TEXT NOT NULL
);
CREATE INDEX track_album ON track(album_id);
CREATE INDEX track_title ON track(title);

where crucially `track` references `album(id)` with a `NOT NULL` reference.

Now, if we query both an inner and an outer join on `track` and
`album` (after having suitably filled-in data), we get very different
plans where only the inner join exploits indices.

That is:

EXPLAIN ANALYZE
SELECT t.id
FROM track t
INNER JOIN album a ON (t.album_id = a.id)
ORDER BY a.title ASC
LIMIT 10;

Produces this query plan:

QUERY
PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.56..3.29 rows=10 width=36) (actual time=0.038..0.052
rows=10 loops=1)
-> Nested Loop (cost=0.56..3606.93 rows=13200 width=36) (actual
time=0.036..0.046 rows=10 loops=1)
-> Index Scan using album_title on album a
(cost=0.28..113.23 rows=1397 width=36) (actual time=0.015..0.016
rows=1 loops=1)
-> Index Scan using track_album on track t (cost=0.29..1.84
rows=66 width=8) (actual time=0.012..0.018 rows=10 loops=1)
Index Cond: (album_id = a.id)
Planning Time: 0.473 ms
Execution Time: 0.096 ms
(7 rows)

While this:

EXPLAIN ANALYZE
SELECT t.id
FROM track t
LEFT JOIN album a ON (t.album_id = a.id)
ORDER BY a.title ASC
LIMIT 10;

Produces this query plan:

QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Limit (cost=604.43..604.45 rows=10 width=36) (actual
time=20.934..20.943 rows=10 loops=1)
-> Sort (cost=604.43..637.43 rows=13200 width=36) (actual
time=20.932..20.934 rows=10 loops=1)
Sort Key: a.title
Sort Method: top-N heapsort Memory: 27kB
-> Hash Left Join (cost=42.43..319.18 rows=13200 width=36)
(actual time=1.082..12.333 rows=10000 loops=1)
Hash Cond: (t.album_id = a.id)
-> Seq Scan on track t (cost=0.00..242.00 rows=13200
width=8) (actual time=0.031..3.919 rows=10000 loops=1)
-> Hash (cost=24.97..24.97 rows=1397 width=36)
(actual time=0.990..0.991 rows=1000 loops=1)
Buckets: 2048 Batches: 1 Memory Usage: 101kB
-> Seq Scan on album a (cost=0.00..24.97
rows=1397 width=36) (actual time=0.014..0.451 rows=1000 loops=1)
Planning Time: 0.251 ms
Execution Time: 20.999 ms
(12 rows)

My question then is, shouldn't the inner and outer join queries be
semantically equivalent when the columns we are joining on are
non-nullable foreign keys?
Is there some corner case I'm not considering?
Would it be a good addition to postgres if it could detect this and
produce a plan that exploits the indices?

(My root motivation for asking this question is this github issue:
https://github.com/hasura/graphql-engine/issues/5949)

Regards,
Philip

Attachment Content-Type Size
outer-join-indexes.sql application/sql 1.6 KB

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Bob Jolliffe 2021-05-24 13:52:56 Re: transaction blocking on COMMIT
Previous Message Vijaykumar Jain 2021-05-24 11:35:32 Re: transaction blocking on COMMIT