Re: window function induces full table scan

From: Thomas Mayer <thomas(dot)mayer(at)student(dot)kit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: window function induces full table scan
Date: 2014-01-03 17:44:50
Message-ID: 52C6F712.6040804@student.kit.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Am 03.01.2014 15:54, schrieb Tom Lane:
> Thomas Mayer <thomas(dot)mayer(at)student(dot)kit(dot)edu> writes:
>> To implement the optimization, subquery_is_pushdown_safe() needs to
>> return true if pushing down the quals to a subquery which has window
>> functions is in fact safe ("quals that only reference subquery
>> outputs that are listed in the PARTITION clauses of all window functions
>> in the subquery").
>
> I'd just remove that check.
>
>> Plus, there is a function qual_is_pushdown_safe(...) which contains an
>> assertion, which might possibly become obsolete:
>
> No, that should stay. There are no window functions in the upper query's
> WHERE, there will be none pushed into the lower's WHERE, and that's as it
> must be.
>
>> Tom, do you think that these two changes could be sufficient?
>
> Certainly not. What you'd need to do is include the
> is-it-listed-in-all-PARTITION-clauses consideration in the code that marks
> "unsafe" subquery output columns.

Clear. That's what I intended to write ;)

For better performance, we could first check subquery->hasWindowFuncs
and only if this evaluates to true, check if the attribute is marked as
a "unsafe subquery output column". If it's unsafe
subquery_is_pushdown_safe() needs to return false.

I was first thinking to do the decision safe/unsafe in the
subquery_is_pushdown_safe() function or in a new function that is called
by subquery_is_pushdown_safe().

... "mark" ...: Do I understand you correctly, that you prefer doing the
decision elsewhere and store the result (safe/unsafe) boolean value
besides to the subquery output fields? For the push-down, a subquery
output field must be available anyways.

More in general, one could possibly "mark" safe subquery output fields
for all the tests in subquery_is_pushdown_safe(). So this function would
not necessarily have to be in allpaths.c at all. But that kind of
refactoring would be a different issue which also could be implemented
separately.

> And update all the relevant comments.
> And maybe add a couple of regression test cases.
>

Indeed, that would be nice ;) Straightforward, there should be
- A test case with one window function (in the subquery) and one
condition, which is safe to be pushed down. Then, assert that
subquery_is_pushdown_safe() returns true
- A test case with one window function and one condition, which is
unsafe to be pushed down. Then, assert that subquery_is_pushdown_safe()
returns false
- A test case with multiple window functions and multiple conditions,
all safe to be pushed down. Then, assert that
subquery_is_pushdown_safe() returns true
- A test case with multiple window functions and multiple conditions,
all except one safe to be pushed down. Then, assert that
subquery_is_pushdown_safe() returns true for the safe ones and false for
the unsafe ones
- a test case that ensures that, after the change, the right query plan
is chosen (integration test).
- execute some example queries (integration test).

What do you think about it? What else needs to be tested?

> Offhand I think the details of testing whether a given output column
> appears in a given partition clause are identical to testing whether
> it appears in the distinctClause. So you'd just be mechanizing running
> through the windowClause list to verify whether this holds for all
> the WINDOW clauses.

When a field is element of all PARTITION BY clauses of all window
functions, it does not mean that this field is distinct. Think of a
query like

SELECT * FROM (
SELECT
b,
rank() OVER (PARTITION BY b ORDER BY c DESC) AS rankc
FROM tbl;
) as tmp
WHERE tmp.b=1

In that case, the field b is not distinct (as there is no GROUP BY b).
Anyways, tmp.b=1 would be safe to to be pushed down.

Do you mean that a GROUP BY b is done implicitly (and internally) at a
deeper level just for the window function and _therefore_ is distinct at
that point?

Does the safe/unsafe marking survive recursiveness (over subquery
levels) then?

>
> Note that if you just look at the windowClause list, then you might
> be filtering by named window definitions that appeared in the WINDOW
> clause but were never actually referenced by any window function.
> I don't have a problem with blowing off the optimization in such cases.
> I don't think it's appropriate to expend the cycles that would be needed
> to discover whether they're all referenced at this point. (If anyone ever
> complains, it'd be much cheaper to modify the parser to get rid of
> unreferenced window definitions.)
>
> regards, tom lane
> .
>

Best regards
Thomas

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2014-01-03 18:04:13 Re: window function induces full table scan
Previous Message Tom Lane 2014-01-03 14:54:25 Re: window function induces full table scan