Nested loop does not preserve order. Why?

From: Alexey Bashtanov <bashtanov(at)imap(dot)cc>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Nested loop does not preserve order. Why?
Date: 2014-02-11 08:54:33
Message-ID: 52F9E549.9060900@imap.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hello!

Let us have a table T(A, B), an index on T(B, A) and a much smaller
table V(B).

stat-dev1/ui_dev_16 12:27:07 > create table u as select a, a % 1000 b
from generate_series(1, 10000000) a;
SELECT 10000000
stat-dev1/ui_dev_16 12:27:28 > create table v as select distinct b from
u where b % 100 = 1;
SELECT 10
stat-dev1/ui_dev_16 12:27:41 > analyze t;
ANALYZE
stat-dev1/ui_dev_16 12:27:42 > analyze v;
ANALYZE
stat-dev1/ui_dev_16 12:27:45 > create index u_i on u(b,a);
CREATE INDEX

And we are looking for all rows of T that have B present in V and sorted
by B, A.
The most natural way to obtain them would be to take all records from V
ordered by B, and to perform an index-only scan on T for each B.
But not for postgresql :(

stat-dev1/ui_dev_16 12:28:57 > EXPLAIN ANALYZE select b, a from u where
b in (select b from v) order by b, a;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Sort (cost=879810.77..892310.77 rows=5000000 width=8) (actual
time=4728.831..4756.802 rows=100000 loops=1)
Sort Key: u.b, u.a
Sort Method: external merge Disk: 1760kB
-> Hash Join (cost=1.35..186749.35 rows=5000000 width=8) (actual
time=0.121..4568.170 rows=100000 loops=1)
Hash Cond: (u.b = v.b)
-> Seq Scan on u (cost=0.00..144248.00 rows=10000000
width=8) (actual time=0.027..2294.474 rows=10000000 loops=1)
-> Hash (cost=1.23..1.23 rows=10 width=4) (actual
time=0.071..0.071 rows=10 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 1kB
-> HashAggregate (cost=1.12..1.23 rows=10 width=4)
(actual time=0.062..0.063 rows=10 loops=1)
-> Seq Scan on v (cost=0.00..1.10 rows=10
width=4) (actual time=0.050..0.052 rows=10 loops=1)
Total runtime: 4771.817 ms
(11 rows)

Replacing IN by EXISTS changes nothing. Putting an ORDER BY clause into
IN(...) unsurprisingly does not help.
The only way to obtain the desired plan I found is to rewrite the query:

stat-dev1/ui_dev_16 12:29:06 > set enable_bitmapscan to off;
SET
stat-dev1/ui_dev_16 12:29:11 > set enable_seqscan to off;
SET
stat-dev1/ui_dev_16 12:29:13 > EXPLAIN ANALYZE select b, a from u where
b = any((select array_agg(b) from v)::int[]) order by b, a;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Index Only Scan using u_i on u (cost=10000000001.58..10000315253.45
rows=99551 width=8) (actual time=0.078..103.239 rows=100000 loops=1)
Index Cond: (b = ANY ($0))
Heap Fetches: 100000
InitPlan 1 (returns $0)
-> Aggregate (cost=10000000001.13..10000000001.14 rows=1
width=4) (actual time=0.018..0.018 rows=1 loops=1)
-> Seq Scan on v (cost=10000000000.00..10000000001.10
rows=10 width=4) (actual time=0.006..0.009 rows=10 loops=1)
Total runtime: 119.929 ms
(7 rows)

The plan I expect for the original query is something like that:

Nested Loop Preserving Order (cost=1.85..317895.69 rows=100000 width=8)
-> HashAggregate (cost=1.42..1.52 rows=10 width=4)
-> Sort (cost=1.27..1.29 rows=10 width=4)
Sort Key: v.b
-> Seq Scan on v (cost=0.00..1.10 rows=10 width=4)
-> Index Only Scan using u_i on u (cost=0.43..31689.42 rows=10000
width=8)
Index Cond: (b = v.b)

But if we try to hint postgresql we will realize that nested loop does
not preserve order (see below). Why?

stat-dev1/ui_dev_16 12:42:32 > set enable_bitmapscan to off;
SET
Time: 0,895 ms
stat-dev1/ui_dev_16 12:42:33 > set enable_hashjoin to off;
SET
Time: 0,762 ms
stat-dev1/ui_dev_16 12:42:36 > EXPLAIN select b, a from u where b in
(select b from v order by b) order by b, a;
QUERY PLAN
--------------------------------------------------------------------------------------
Sort (cost=326200.51..326450.51 rows=100000 width=8)
Sort Key: u.b, u.a
-> Nested Loop (cost=1.85..317895.69 rows=100000 width=8)
-> HashAggregate (cost=1.42..1.52 rows=10 width=4)
-> Sort (cost=1.27..1.29 rows=10 width=4)
Sort Key: v.b
-> Seq Scan on v (cost=0.00..1.10 rows=10 width=4)
-> Index Only Scan using u_i on u (cost=0.43..31689.42
rows=10000 width=8)
Index Cond: (b = v.b)
(9 rows)

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message mduyunov 2014-02-11 14:32:10 BUG #9186: CONTEXT log line still appears when turned off
Previous Message RogerBerkelmans 2014-02-11 02:27:19 BUG #9182: Data type (text) issue with MS SQL 2012