From: | Thomas Mayer <thomas(dot)mayer(at)student(dot)kit(dot)edu> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-performance <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: window function induces full table scan |
Date: | 2014-01-03 17:44:50 |
Message-ID: | 52C6F712.6040804@student.kit.edu |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Am 03.01.2014 15:54, schrieb Tom Lane:
> Thomas Mayer <thomas(dot)mayer(at)student(dot)kit(dot)edu> writes:
>> To implement the optimization, subquery_is_pushdown_safe() needs to
>> return true if pushing down the quals to a subquery which has window
>> functions is in fact safe ("quals that only reference subquery
>> outputs that are listed in the PARTITION clauses of all window functions
>> in the subquery").
>
> I'd just remove that check.
>
>> Plus, there is a function qual_is_pushdown_safe(...) which contains an
>> assertion, which might possibly become obsolete:
>
> No, that should stay. There are no window functions in the upper query's
> WHERE, there will be none pushed into the lower's WHERE, and that's as it
> must be.
>
>> Tom, do you think that these two changes could be sufficient?
>
> Certainly not. What you'd need to do is include the
> is-it-listed-in-all-PARTITION-clauses consideration in the code that marks
> "unsafe" subquery output columns.
Clear. That's what I intended to write ;)
For better performance, we could first check subquery->hasWindowFuncs
and only if this evaluates to true, check if the attribute is marked as
a "unsafe subquery output column". If it's unsafe
subquery_is_pushdown_safe() needs to return false.
I was first thinking to do the decision safe/unsafe in the
subquery_is_pushdown_safe() function or in a new function that is called
by subquery_is_pushdown_safe().
... "mark" ...: Do I understand you correctly, that you prefer doing the
decision elsewhere and store the result (safe/unsafe) boolean value
besides to the subquery output fields? For the push-down, a subquery
output field must be available anyways.
More in general, one could possibly "mark" safe subquery output fields
for all the tests in subquery_is_pushdown_safe(). So this function would
not necessarily have to be in allpaths.c at all. But that kind of
refactoring would be a different issue which also could be implemented
separately.
> And update all the relevant comments.
> And maybe add a couple of regression test cases.
>
Indeed, that would be nice ;) Straightforward, there should be
- A test case with one window function (in the subquery) and one
condition, which is safe to be pushed down. Then, assert that
subquery_is_pushdown_safe() returns true
- A test case with one window function and one condition, which is
unsafe to be pushed down. Then, assert that subquery_is_pushdown_safe()
returns false
- A test case with multiple window functions and multiple conditions,
all safe to be pushed down. Then, assert that
subquery_is_pushdown_safe() returns true
- A test case with multiple window functions and multiple conditions,
all except one safe to be pushed down. Then, assert that
subquery_is_pushdown_safe() returns true for the safe ones and false for
the unsafe ones
- a test case that ensures that, after the change, the right query plan
is chosen (integration test).
- execute some example queries (integration test).
What do you think about it? What else needs to be tested?
> Offhand I think the details of testing whether a given output column
> appears in a given partition clause are identical to testing whether
> it appears in the distinctClause. So you'd just be mechanizing running
> through the windowClause list to verify whether this holds for all
> the WINDOW clauses.
When a field is element of all PARTITION BY clauses of all window
functions, it does not mean that this field is distinct. Think of a
query like
SELECT * FROM (
SELECT
b,
rank() OVER (PARTITION BY b ORDER BY c DESC) AS rankc
FROM tbl;
) as tmp
WHERE tmp.b=1
In that case, the field b is not distinct (as there is no GROUP BY b).
Anyways, tmp.b=1 would be safe to to be pushed down.
Do you mean that a GROUP BY b is done implicitly (and internally) at a
deeper level just for the window function and _therefore_ is distinct at
that point?
Does the safe/unsafe marking survive recursiveness (over subquery
levels) then?
>
> Note that if you just look at the windowClause list, then you might
> be filtering by named window definitions that appeared in the WINDOW
> clause but were never actually referenced by any window function.
> I don't have a problem with blowing off the optimization in such cases.
> I don't think it's appropriate to expend the cycles that would be needed
> to discover whether they're all referenced at this point. (If anyone ever
> complains, it'd be much cheaper to modify the parser to get rid of
> unreferenced window definitions.)
>
> regards, tom lane
> .
>
Best regards
Thomas
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2014-01-03 18:04:13 | Re: window function induces full table scan |
Previous Message | Tom Lane | 2014-01-03 14:54:25 | Re: window function induces full table scan |