Re: Overhead of union versus union all

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, Scott Bailey <artacus(at)comcast(dot)net>, pgsql-general(at)postgresql(dot)org
Subject: Re: Overhead of union versus union all
Date: 2009-07-10 17:37:27
Message-ID: 1247247447.26589.328.camel@monkey-cat.sm.truviso.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, 2009-07-10 at 14:22 +0100, Simon Riggs wrote:
> I mean it seems possible to prove that the distinct removal step is not
> necessary, by proving that the various sub-queries are already disjoint.
> It's a common manual optimization, so automating it seems a reasonable
> future goal.

There are even simpler cases that postgresql can't optimize. Consider:

-- foo has a primary key
SELECT * FROM foo UNION SELECT * FROM foo;

That's logically equivalent to:

SELECT * FROM foo;

But postgresql will add a sort anyway.

There are lots of optimizations along these lines. They seem obscure,
but these optimizations become much more useful when using views or
complex queries where the same table appears multiple times. For
instance, if you have two views that are projections of the same table,
then, you join the views together, you can optimize away the join in
some cases, and just scan the original table.

I think a lot of these optimizations depend on knowing which tables (or
subqueries) are relations in the relational theory sense; i.e.
unordered, distinct, and have no NULLs in the relevant attributes.

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message James B. Byrne 2009-07-10 17:38:57 How to trace client sql requests?
Previous Message decibel 2009-07-10 17:37:02 Re: REINDEX "is not a btree"