Re: pgsql: Allow pushdown of WHERE quals into subqueries with window functi

From: Noah Misch <noah(at)leadboat(dot)com>
To: dgrowley(at)gmail(dot)com, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pgsql: Allow pushdown of WHERE quals into subqueries with window functi
Date: 2014-12-20 06:19:58
Message-ID: 20141220061958.GB1838835@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On Sat, Jun 28, 2014 at 06:08:05AM +0000, Tom Lane wrote:
> Allow pushdown of WHERE quals into subqueries with window functions.
>
> We can allow this even without any specific knowledge of the semantics
> of the window function, so long as pushed-down quals will either accept
> every row in a given window partition, or reject every such row. Because
> window functions act only within a partition, such a case can't result
> in changing the window functions' outputs for any surviving row.
> Eliminating entire partitions in this way obviously can reduce the cost
> of the window-function computations substantially.
>
> The fly in the ointment is that it's hard to be entirely sure whether
> this is true for an arbitrary qual condition. This patch allows pushdown
> if (a) the qual references only partitioning columns, and (b) the qual
> contains no volatile functions. We are at risk of incorrect results if
> the qual can produce different answers for values that the partitioning
> equality operator sees as equal. While it's not hard to invent cases
> for which that can happen, it seems to seldom be a problem in practice,
> since no one has complained about a similar assumption that we've had
> for many years with respect to DISTINCT. The potential performance
> gains seem to be worth the risk.

SQL leaves room for our DISTINCT optimization but not for this. Where
pushdown of a qual beneath DISTINCT changes query results, it does so by
changing which one of various equal, distinguishable values survives to
represent a group. SQL says:

If the <set quantifier> DISTINCT is specified, and the most specific type
of a result column is character string, datetime with time zone or a
user-defined type, then the precise values retained in that column after
eliminating redundant duplicates is implementation-dependent.

PostgreSQL conforms, making qual pushdown one of the factors determining which
value to retain. This influences queries having a pushed-down qual that
distinguishes equal values. An optimization fence changes the result:

SELECT * FROM (SELECT DISTINCT x FROM (VALUES (1.00),(1.1),(1.0)) t1(x) OFFSET 0) t0 WHERE length(x::text) <> 4;
x
-----
1.1
(1 row)

SELECT * FROM (SELECT DISTINCT x FROM (VALUES (1.00),(1.1),(1.0)) t1(x)) t0 WHERE length(x::text) <> 4;
x
-----
1.0
1.1
(2 rows)

Type "timetamptz" is unaffected in PostgreSQL. Character strings are affected
less than SQL contemplates, because PostgreSQL strings use binary equality. I
don't know why SQL omits "numeric" from the list of affected types.

Here is the corresponding scenario for window functions:

SELECT x, y FROM (
SELECT x, first_value(x) OVER (PARTITION BY x ORDER BY length(x::text)) AS y FROM
(VALUES (1.00),(1.0)) t1(x)
OFFSET 0
) t0 WHERE length(x::text) = 4;
x | y
------+-----
1.00 | 1.0
(1 row)

SELECT x, y FROM (
SELECT x, first_value(x) OVER (PARTITION BY x ORDER BY length(x::text)) AS y FROM
(VALUES (1.00),(1.0)) t1(x)
) t0 WHERE length(x::text) = 4;
x | y
------+------
1.00 | 1.00
(1 row)

Only the first result is correct. SQL does not give us the wiggle room it
offered for DISTINCT. The conformance sacrifice is perhaps worthwhile
nonetheless. I didn't see this highlighted in the original discussion, so I
wished to ensure this compromise is known explicitly.

If the system catalogs recorded whether each equality operator is also an
identity operator, we could restrict the optimization to types where the
equality operator of the partitioning opclass is so marked. That eliminates
the conformance problem. Such an annotation would facilitate other planner
advances; for example, predtest.c could attempt to refute f(x) given x = k.

Thanks,
nm

In response to

Browse pgsql-committers by date

  From Date Subject
Next Message Heikki Linnakangas 2014-12-21 19:58:38 pgsql: Fix file descriptor leak at end of recovery.
Previous Message Bruce Momjian 2014-12-20 02:03:22 Re: pgsql: doc: restrictions on alter database moving default tablespace

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Kirkwood 2014-12-20 06:32:49 Re: Commitfest problems
Previous Message Andres Freund 2014-12-20 05:15:42 Re: hash_create API changes (was Re: speedup tidbitmap patch: hash BlockNumber)