Wrong rows count estimation in query with simple UNION ALL leads to drammatically slow plan

From: Michael Kolomeitsev <mkolomeitsev(at)gmail(dot)com>
To: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Wrong rows count estimation in query with simple UNION ALL leads to drammatically slow plan
Date: 2014-01-06 05:50:47
Message-ID: CAABbzO1AXtst7SDifnQ3Fjz4WaF6okrPPpsumqNZpz3g7QkqMQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

The problem was described here earlier but there is no answers
unfortunately:
http://www.postgresql.org/message-id/1387780366.910542228@f394.i.mail.ru
It looks like defect.

CREATE TABLE t1 (c1 INTEGER, id INTEGER PRIMARY KEY);
INSERT INTO t1 (c1, id) SELECT b, b FROM generate_series(1, 1000000) a (b);
REINDEX TABLE t1;
ANALYZE t1;

CREATE TABLE t2 (c1 INTEGER, id INTEGER PRIMARY KEY);
INSERT INTO t2 (c1, id) SELECT b, b FROM generate_series(1, 1000000) a (b);
REINDEX TABLE t2;
ANALYZE t2;

CREATE TEMP TABLE t3 (ref_id INTEGER);
INSERT INTO t3 (ref_id) VALUES (333333), (666666);
ANALYZE t3;

test=> EXPLAIN (ANALYZE) SELECT * FROM (SELECT * FROM t1 UNION ALL SELECT *
FROM t2) t INNER JOIN t3 ON t3.ref_id = t.id;
QUERY PLAN

----------------------------------------------------------------------------------------------------
--------------------
Nested Loop (cost=0.42..34.84 rows=20000 width=12) (actual
time=0.046..0.104 rows=4 loops=1)
-> Seq Scan on t3 (cost=0.00..1.02 rows=2 width=4) (actual
time=0.008..0.009 rows=2 loops=1)
-> Append (cost=0.42..16.89 rows=2 width=8) (actual time=0.023..0.042
rows=2 loops=2)
-> Index Scan using t1_pkey on t1 (cost=0.42..8.45 rows=1
width=8) (actual time=0.020..0.022 rows=1 loops=2)
Index Cond: (id = t3.ref_id)
-> Index Scan using t2_pkey on t2 (cost=0.42..8.45 rows=1
width=8) (actual time=0.015..0.016 rows=1 loops=2)
Index Cond: (id = t3.ref_id)
Total runtime: 0.184 ms
(8 rows)

This plan is perfect. But the rows estimation is not: 20000 vs 4.
As I can see Pg is able to do correct rows estimation: inner append: rows =
2, outer seq scan: rows = 2. And nested loop has to know that one is able
to produce 2 * 2 = 4 rows max.
Moreover the cost estimation is _correct_! It is corresponded to 'rows=4'.

Why it is important to make correct row estimation? 'Cause it does matter
in more complex query.
Let's join another big table in that query:

CREATE TABLE links (c1 INTEGER PRIMARY KEY, descr TEXT);
INSERT INTO links (c1, descr) SELECT b, '2' FROM generate_series(1, 100000)
a (b);
REINDEX TABLE links;
ANALYZE links;

test=> EXPLAIN (ANALYZE) SELECT * FROM (SELECT * FROM t1 UNION ALL SELECT *
FROM t2) t INNER JOIN t3 ON t3.ref_id = t.id INNER JOIN links l ON (t.c1 =
l.c1);
QUERY PLAN

----------------------------------------------------------------------------------------------------
--------------------------
Hash Join (cost=2693.43..3127.84 rows=20000 width=18) (actual
time=33.619..33.619 rows=0 loops=1)
Hash Cond: (t1.c1 = l.c1)
-> Nested Loop (cost=0.42..34.84 rows=20000 width=12) (actual
time=0.038..0.078 rows=4 loops=1)
-> Seq Scan on t3 (cost=0.00..1.02 rows=2 width=4) (actual
time=0.006..0.007 rows=2 loops=1)
-> Append (cost=0.42..16.89 rows=2 width=8) (actual
time=0.017..0.029 rows=2 loops=2)
-> Index Scan using t1_pkey on t1 (cost=0.42..8.45 rows=1
width=8) (actual time=0.015..0.017 rows=1 loops=2)
Index Cond: (id = t3.ref_id)
-> Index Scan using t2_pkey on t2 (cost=0.42..8.45 rows=1
width=8) (actual time=0.009..0.009 rows=1 loops=2)
Index Cond: (id = t3.ref_id)
-> Hash (cost=1443.00..1443.00 rows=100000 width=6) (actual
time=33.479..33.479 rows=100000 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 3711kB
-> Seq Scan on links l (cost=0.00..1443.00 rows=100000 width=6)
(actual time=0.017..14.853 rows=100000 loops=1)
Total runtime: 33.716 ms
(13 rows)

Planner thinks there'll be 20000 rows when join is performed between "t"
and "t3". And that's why it makes a decision to use hash join with "links"
table.
Let's prove it:

CREATE OR REPLACE FUNCTION public.f1()
RETURNS SETOF integer
LANGUAGE plpgsql
ROWS 20000
AS $function$
BEGIN
RETURN QUERY EXECUTE 'SELECT t.c1 FROM (SELECT * FROM t1 UNION ALL SELECT *
FROM t2) t INNER JOIN t3 ON t3.ref_id = t.id';
END;
$function$

test=> explain select * from f1() t(c1) INNER JOIN links l ON (t.c1 = l.c1);
QUERY PLAN
---------------------------------------------------------------------------
Hash Join (cost=2693.25..3293.25 rows=20000 width=10)
Hash Cond: (t.c1 = l.c1)
-> Function Scan on f1 t (cost=0.25..200.25 rows=20000 width=4)
-> Hash (cost=1443.00..1443.00 rows=100000 width=6)
-> Seq Scan on links l (cost=0.00..1443.00 rows=100000 width=6)
(5 rows)

The same "defect" plan.

test=> ALTER FUNCTION f1() ROWS 4;
ALTER FUNCTION
test=> explain select * from f1() t(c1) INNER JOIN links l ON (t.c1 = l.c1);
QUERY PLAN

--------------------------------------------------------------------------------
Nested Loop (cost=0.54..33.58 rows=4 width=10)
-> Function Scan on f1 t (cost=0.25..0.29 rows=4 width=4)
-> Index Scan using links_pkey on links l (cost=0.29..8.31 rows=1
width=6)
Index Cond: (c1 = t.c1)
(4 rows)

The correct/perfect plan.

In real life I have bigger "links" table and wrong plan slows execution
significantly.
I found several workarounds. And it is not a problem anymore for me.

I just want to report this "strange thing".

I tried to look into source code, found some interesting places there but I
think it is useless: Pg developers know the situation much better than me.

Browse pgsql-performance by date

  From Date Subject
Next Message Marc Cousin 2014-01-06 14:41:35 Re: query plan not optimal
Previous Message Thomas Mayer 2014-01-03 18:25:24 Re: window function induces full table scan