Re: Clarify the ordering guarantees in combining queries (or lack thereof)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Pantelis Theodosiou <ypercube(at)gmail(dot)com>
Cc: Shay Rojansky <roji(at)roji(dot)org>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, pgsql-docs <pgsql-docs(at)postgresql(dot)org>
Subject: Re: Clarify the ordering guarantees in combining queries (or lack thereof)
Date: 2022-07-14 12:01:48
Message-ID: 2742511.1657800108@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs

Pantelis Theodosiou <ypercube(at)gmail(dot)com> writes:
> On Thu, Jul 14, 2022 at 9:16 AM Shay Rojansky <roji(at)roji(dot)org> wrote:
>> I was trying to understand what - if any - are the guarantees with
>> regards to ordering for combining queries (UNION/UNION ALL/...).

> No, there is no guarantee. It's just that UNION ALL works this way today
> (preserving the order of the subselects) - and I'm not even sure about
> that, it may not preserve the order in all cases, with different indexes or
> partitioning or a parallel plan, etc.

Yeah, that. You can get a parallelized plan today for UNION ALL:

=# explain analyze select * from foo union all select * from foo;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Gather (cost=0.00..208552.05 rows=5120008 width=244) (actual time=0.652..390.135 rows=5120000 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Append (cost=0.00..208552.05 rows=2133336 width=244) (actual time=0.021..228.848 rows=1706667 loops=3)
-> Parallel Seq Scan on foo (cost=0.00..98942.68 rows=1066668 width=244) (actual time=0.453..78.084 rows=853333 loops=3)
-> Parallel Seq Scan on foo foo_1 (cost=0.00..98942.68 rows=1066668 width=244) (actual time=0.024..125.299 rows=1280000 loops=2)
Planning Time: 0.094 ms
Execution Time: 488.352 ms

It's true that in simple non-parallelized cases we'll do the first query
then the second, but SQL doesn't promise that to be true and neither does
Postgres.

>>> If you want ordered output use ORDER BY.

>> I don't see how that could be done. Consider the following:
>> (SELECT id FROM data ORDER BY id)
>> UNION ALL
>> (SELECT id FROM data ORDER BY id DESC);

You do it like this:

=# explain analyze (select * from foo union all select * from foo) order by unique1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Gather Merge (cost=839258.04..889070.63 rows=4266672 width=244) (actual time=931.054..1707.780 rows=5120000 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=839258.01..844591.35 rows=2133336 width=244) (actual time=924.821..1132.096 rows=1706667 loops=3)
Sort Key: foo.unique1
Sort Method: external merge Disk: 433552kB
Worker 0: Sort Method: external merge Disk: 423400kB
Worker 1: Sort Method: external merge Disk: 415592kB
-> Parallel Append (cost=0.00..208552.05 rows=2133336 width=244) (actual time=0.051..218.476 rows=1706667 loops=3)
-> Parallel Seq Scan on foo (cost=0.00..98942.68 rows=1066668 width=244) (actual time=0.030..79.118 rows=853333 loops=3)
-> Parallel Seq Scan on foo foo_1 (cost=0.00..98942.68 rows=1066668 width=244) (actual time=0.073..118.566 rows=1280000 loops=2)
Planning Time: 0.109 ms
Execution Time: 1830.390 ms
(13 rows)

The parentheses are actually optional here, if memory serves --- to get
the ORDER BY to be applied inside the second sub-select, you'd have to
use parens as Shay had it.

regards, tom lane

In response to

Responses

Browse pgsql-docs by date

  From Date Subject
Next Message David G. Johnston 2022-07-14 12:28:44 Re: Clarify the ordering guarantees in combining queries (or lack thereof)
Previous Message Pantelis Theodosiou 2022-07-14 10:22:43 Re: Clarify the ordering guarantees in combining queries (or lack thereof)