union vs. sort

From: Karel Zak <zakkr(at)zf(dot)jcu(dot)cz>
To: pgsql-hackers <pgsql-hackers(at)postgreSQL(dot)org>
Subject: union vs. sort
Date: 2004-04-06 12:04:35
Message-ID: 20040406120435.GA11681@zf.jcu.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


I'm surprise with query plan that PostgreSQL planner prepare for
selects with ORDER BY if all data are from sub-select that is already
sorted.

# explain select data from
(select distinct data from addr)
as x order by x.data;
-------------------------------------------------
Subquery Scan x
-> Unique
-> Sort
Sort Key: data
-> Seq Scan on addr

This is right -- the main of query doesn't use "Sort" for ORDER BY,
because subselect is sorted by "Unique".

And almost same query, but in the subselect is union:

# explain select data from
(select data from addr
union
select data from addr2)
as x order by x.data;
-----------------------------------------
Sort
Sort Key: data
-> Subquery Scan x
-> Unique
-> Sort
Sort Key: data
-> Append
-> Subquery Scan "*SELECT* 1"
-> Seq Scan on addr
-> Subquery Scan "*SELECT* 2"
-> Seq Scan on addr2


I think it's bad, because there is used extra sort for ORDER BY for
already by "Unique" sorted data.

If I add ORDER BY to subselect:

# explain select data from
(select data from addr
union
select data from addr2 order by data)
as x order by x.data;
---------------------------------------------------
Sort
Sort Key: data
-> Subquery Scan x
-> Sort
Sort Key: data
-> Unique
-> Sort
Sort Key: data
-> Append
-> Subquery Scan "*SELECT* 1"
-> Seq Scan on addr
-> Subquery Scan "*SELECT* 2"
-> Seq Scan on addr2

I see two unnecessary sorts for unique and already sorted data.

The core of problem is probbaly UNION, because if I use simple query without
subselect it still sort already sorderd data:

# explain select data from addr
union
select data from addr2
order by data;
-----------------------------------
Sort
Sort Key: data
-> Unique
-> Sort
Sort Key: data
-> Append
-> Subquery Scan "*SELECT* 1"
-> Seq Scan on addr
-> Subquery Scan "*SELECT* 2"
-> Seq Scan on addr2

Or order of data which returns "unique" is for UNION diffrent that data
from DISTINCT? (see first example).

Karel

--
Karel Zak <zakkr(at)zf(dot)jcu(dot)cz>
http://home.zf.jcu.cz/~zakkr/

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Richard Huxton 2004-04-06 12:23:43 Re: pg_hba.conf view from the database?
Previous Message Fabien COELHO 2004-04-06 11:10:12 pg_hba.conf view from the database?