From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Pantelis Theodosiou <ypercube(at)gmail(dot)com> |
Cc: | Shay Rojansky <roji(at)roji(dot)org>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, pgsql-docs <pgsql-docs(at)postgresql(dot)org> |
Subject: | Re: Clarify the ordering guarantees in combining queries (or lack thereof) |
Date: | 2022-07-14 12:01:48 |
Message-ID: | 2742511.1657800108@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-docs |
Pantelis Theodosiou <ypercube(at)gmail(dot)com> writes:
> On Thu, Jul 14, 2022 at 9:16 AM Shay Rojansky <roji(at)roji(dot)org> wrote:
>> I was trying to understand what - if any - are the guarantees with
>> regards to ordering for combining queries (UNION/UNION ALL/...).
> No, there is no guarantee. It's just that UNION ALL works this way today
> (preserving the order of the subselects) - and I'm not even sure about
> that, it may not preserve the order in all cases, with different indexes or
> partitioning or a parallel plan, etc.
Yeah, that. You can get a parallelized plan today for UNION ALL:
=# explain analyze select * from foo union all select * from foo;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Gather (cost=0.00..208552.05 rows=5120008 width=244) (actual time=0.652..390.135 rows=5120000 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Append (cost=0.00..208552.05 rows=2133336 width=244) (actual time=0.021..228.848 rows=1706667 loops=3)
-> Parallel Seq Scan on foo (cost=0.00..98942.68 rows=1066668 width=244) (actual time=0.453..78.084 rows=853333 loops=3)
-> Parallel Seq Scan on foo foo_1 (cost=0.00..98942.68 rows=1066668 width=244) (actual time=0.024..125.299 rows=1280000 loops=2)
Planning Time: 0.094 ms
Execution Time: 488.352 ms
It's true that in simple non-parallelized cases we'll do the first query
then the second, but SQL doesn't promise that to be true and neither does
Postgres.
>>> If you want ordered output use ORDER BY.
>> I don't see how that could be done. Consider the following:
>> (SELECT id FROM data ORDER BY id)
>> UNION ALL
>> (SELECT id FROM data ORDER BY id DESC);
You do it like this:
=# explain analyze (select * from foo union all select * from foo) order by unique1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Gather Merge (cost=839258.04..889070.63 rows=4266672 width=244) (actual time=931.054..1707.780 rows=5120000 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=839258.01..844591.35 rows=2133336 width=244) (actual time=924.821..1132.096 rows=1706667 loops=3)
Sort Key: foo.unique1
Sort Method: external merge Disk: 433552kB
Worker 0: Sort Method: external merge Disk: 423400kB
Worker 1: Sort Method: external merge Disk: 415592kB
-> Parallel Append (cost=0.00..208552.05 rows=2133336 width=244) (actual time=0.051..218.476 rows=1706667 loops=3)
-> Parallel Seq Scan on foo (cost=0.00..98942.68 rows=1066668 width=244) (actual time=0.030..79.118 rows=853333 loops=3)
-> Parallel Seq Scan on foo foo_1 (cost=0.00..98942.68 rows=1066668 width=244) (actual time=0.073..118.566 rows=1280000 loops=2)
Planning Time: 0.109 ms
Execution Time: 1830.390 ms
(13 rows)
The parentheses are actually optional here, if memory serves --- to get
the ORDER BY to be applied inside the second sub-select, you'd have to
use parens as Shay had it.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | David G. Johnston | 2022-07-14 12:28:44 | Re: Clarify the ordering guarantees in combining queries (or lack thereof) |
Previous Message | Pantelis Theodosiou | 2022-07-14 10:22:43 | Re: Clarify the ordering guarantees in combining queries (or lack thereof) |