Pushing IN (subquery) down through UNION ALL?

From: Dave Johansen <davejohansen(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Pushing IN (subquery) down through UNION ALL?
Date: 2011-02-24 04:10:31
Message-ID: AANLkTimDp8w3s9hmMjkz-e6ZzXzgZF6JtUbdraYc6aVQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I'm using PostgreSQL 8.3.3 and I have a view that does a UNION ALL on
two joins and it doesn't seem to want to push the IN (subquery)
optimization down into the plan for the two queries being unioned. Is
there something I can do to fix this? Or is it just a limitation of
the planner/optimizer?

I also tried this with 8.4.7 and it seemed to exhibit the same
behaviour, so here's an example of what I'm talking about (obviously
in a real system I'd have indexes and all that other fun stuff):

CREATE TABLE users (id SERIAL PRIMARY KEY, name TEXT);
CREATE TABLE addresses1 (userid INTEGER, value INTEGER);
CREATE TABLE addresses1 (userid INTEGER, value INTEGER);
CREATE VIEW addressesall AS SELECT u.id, u.name, a.value FROM
addresses1 AS a JOIN users AS u ON a.userid=u.id UNION ALL SELECT
u.id, u.name, a.value FROM addresses2 AS a JOIN users AS u ON
a.userid=u.id;

Here's the EXPLAIN output for two example queries:

test=# EXPLAIN ANALYZE SELECT * FROM addressesall WHERE id IN (SELECT
id FROM users WHERE name='A');
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Hash Semi Join (cost=2.15..5.58 rows=1 width=40) (actual
time=0.144..0.340 rows=3 loops=1)
Hash Cond: (u.id = users.id)
-> Append (cost=1.09..4.48 rows=9 width=40) (actual
time=0.059..0.239 rows=9 loops=1)
-> Hash Join (cost=1.09..2.19 rows=4 width=10) (actual
time=0.055..0.075 rows=4 loops=1)
Hash Cond: (a.userid = u.id)
-> Seq Scan on addresses1 a (cost=0.00..1.04 rows=4
width=8) (actual time=0.006..0.013 rows=4 loops=1)
-> Hash (cost=1.04..1.04 rows=4 width=6) (actual
time=0.019..0.019 rows=4 loops=1)
-> Seq Scan on users u (cost=0.00..1.04 rows=4
width=6) (actual time=0.003..0.008 rows=4 loops=1)
-> Hash Join (cost=1.09..2.21 rows=5 width=10) (actual
time=0.109..0.133 rows=5 loops=1)
Hash Cond: (a.userid = u.id)
-> Seq Scan on addresses2 a (cost=0.00..1.05 rows=5
width=8) (actual time=0.004..0.012 rows=5 loops=1)
-> Hash (cost=1.04..1.04 rows=4 width=6) (actual
time=0.020..0.020 rows=4 loops=1)
-> Seq Scan on users u (cost=0.00..1.04 rows=4
width=6) (actual time=0.004..0.010 rows=4 loops=1)
-> Hash (cost=1.05..1.05 rows=1 width=4) (actual
time=0.053..0.053 rows=1 loops=1)
-> Seq Scan on users (cost=0.00..1.05 rows=1 width=4)
(actual time=0.032..0.040 rows=1 loops=1)
Filter: (name = 'A'::text)
Total runtime: 0.519 ms
(17 rows)

test=# EXPLAIN ANALYZE SELECT * FROM addressesall WHERE id IN (SELECT
id FROM users WHERE name='A');
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Hash Semi Join (cost=2.15..5.58 rows=1 width=40) (actual
time=0.144..0.340 rows=3 loops=1)
Hash Cond: (u.id = users.id)
-> Append (cost=1.09..4.48 rows=9 width=40) (actual
time=0.059..0.239 rows=9 loops=1)
-> Hash Join (cost=1.09..2.19 rows=4 width=10) (actual
time=0.055..0.075 rows=4 loops=1)
Hash Cond: (a.userid = u.id)
-> Seq Scan on addresses1 a (cost=0.00..1.04 rows=4
width=8) (actual time=0.006..0.013 rows=4 loops=1)
-> Hash (cost=1.04..1.04 rows=4 width=6) (actual
time=0.019..0.019 rows=4 loops=1)
-> Seq Scan on users u (cost=0.00..1.04 rows=4
width=6) (actual time=0.003..0.008 rows=4 loops=1)
-> Hash Join (cost=1.09..2.21 rows=5 width=10) (actual
time=0.109..0.133 rows=5 loops=1)
Hash Cond: (a.userid = u.id)
-> Seq Scan on addresses2 a (cost=0.00..1.05 rows=5
width=8) (actual time=0.004..0.012 rows=5 loops=1)
-> Hash (cost=1.04..1.04 rows=4 width=6) (actual
time=0.020..0.020 rows=4 loops=1)
-> Seq Scan on users u (cost=0.00..1.04 rows=4
width=6) (actual time=0.004..0.010 rows=4 loops=1)
-> Hash (cost=1.05..1.05 rows=1 width=4) (actual
time=0.053..0.053 rows=1 loops=1)
-> Seq Scan on users (cost=0.00..1.05 rows=1 width=4)
(actual time=0.032..0.040 rows=1 loops=1)
Filter: (name = 'A'::text)
Total runtime: 0.519 ms
(17 rows)

test=# EXPLAIN ANALYZE SELECT * FROM addressesall WHERE id IN (1);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Result (cost=0.00..4.27 rows=3 width=40) (actual time=0.053..0.114
rows=3 loops=1)
-> Append (cost=0.00..4.27 rows=3 width=40) (actual
time=0.049..0.101 rows=3 loops=1)
-> Nested Loop (cost=0.00..2.12 rows=2 width=10) (actual
time=0.046..0.063 rows=2 loops=1)
-> Seq Scan on users u (cost=0.00..1.05 rows=1
width=6) (actual time=0.025..0.028 rows=1 loops=1)
Filter: (id = 1)
-> Seq Scan on addresses1 a (cost=0.00..1.05 rows=2
width=8) (actual time=0.009..0.017 rows=2 loops=1)
Filter: (a.userid = 1)
-> Nested Loop (cost=0.00..2.12 rows=1 width=10) (actual
time=0.015..0.025 rows=1 loops=1)
-> Seq Scan on addresses2 a (cost=0.00..1.06 rows=1
width=8) (actual time=0.005..0.008 rows=1 loops=1)
Filter: (userid = 1)
-> Seq Scan on users u (cost=0.00..1.05 rows=1
width=6) (actual time=0.004..0.007 rows=1 loops=1)
Filter: (u.id = 1)
Total runtime: 0.251 ms
(13 rows)

You'll notice that the subquery version is doing the full join and
then the filtering, but the explicitly listed version pushing the
filtering into the plan before the join. Is there a way to make the
subquery version perform the same optimization?

Thanks,
Dave

Browse pgsql-performance by date

  From Date Subject
Next Message Pierre C 2011-02-24 08:22:53 Re: performance issue in the fields.
Previous Message Tom Lane 2011-02-23 21:37:34 Re: NULLS LAST performance