From: | Hywel Carver <hywel(at)skillerwhale(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Self-join optimisation |
Date: | 2021-03-11 14:06:23 |
Message-ID: | CAFcA2FZjbhtdSsPO2MaDw2dGzZez2ViLNoKpG-LPXNnwnj5q8g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
I asked this question in the Postgres Slack, and was recommended to ask
here instead.
A few times, I've been in a situation where I want to join a table to
itself on its primary key. That typically happens because I have some kind
of summary view, which I then want to join to the original table (using its
primary key) to flesh out the summary data with other columns. That's
executed as a join, which surprised me. But in this case, I could extend
the view to have all of the columns of the original table to avoid the join.
But there's another case that's harder to solve this way: combining views
together. Here's a trivial example:
CREATE TABLE users (id BIGINT PRIMARY KEY, varchar name);
CREATE VIEW only_some_users AS (SELECT * FROM users WHERE id < 10);
CREATE VIEW some_other_users AS (SELECT * FROM users WHERE id > 3);
EXPLAIN SELECT * FROM only_some_users
INNER JOIN some_other_users ON only_some_users.id = some_other_users.id;
Hash Join (cost=29.23..43.32 rows=90 width=144)
Hash Cond: (users.id = users_1.id)
-> Bitmap Heap Scan on users (cost=6.24..19.62 rows=270 width=72)
Recheck Cond: (id < 10)
-> Bitmap Index Scan on users_pkey (cost=0.00..6.18 rows=270
width=0)
Index Cond: (id < 10)
-> Hash (cost=19.62..19.62 rows=270 width=72)
-> Bitmap Heap Scan on users users_1 (cost=6.24..19.62 rows=270
width=72)
Recheck Cond: (id > 3)
-> Bitmap Index Scan on users_pkey (cost=0.00..6.18
rows=270 width=0)
Index Cond: (id > 3)
Is there a reason why Postgres doesn't have an optimisation built in to
optimise this JOIN? What I'm imagining is that a join between two aliases
for the same table on its primary key could be optimised by treating them
as the same table. I think the same would be true for self-joins on any
non-null columns covered by a uniqueness constraint.
If this is considered a desirable change, I'd be keen to work on it (with
some guidance).
Thanks,
Hywel
<https://files.slack.com/files-pri/TMKTMS7PB-F01E0TSQH3P/sw_horizontal_colour__1_.png>
From | Date | Subject | |
---|---|---|---|
Next Message | Surafel Temesgen | 2021-03-11 14:14:50 | Re: WIP: System Versioned Temporal Table |
Previous Message | Matthias van de Meent | 2021-03-11 13:45:54 | Re: non-HOT update not looking at FSM for large tuple update |