Re: window function induces full table scan

From: Thomas Mayer <thomas(dot)mayer(at)student(dot)kit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: window function induces full table scan
Date: 2014-01-03 00:13:00
Message-ID: 52C6008C.7090201@student.kit.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Just to track it down: The limitation can also be reproduced without
using views. Using views is just a use case where the suggested
optimization is actually needed.

Plus, when I remove the condition "WHERE datepos = 1", the same
behaviour still occurs. Here, I wanted to see if postgresql is
preferring the condition "WHERE datepos = 1" (datepos is the result of
the window function) over the condition "user_id = 43" for optimization.
But this is not the case.

-- workaround example: "WHERE user_id = 43" condition in subselect

SELECT user_id, latitude, longitude
FROM (
SELECT
user_id,
latitude,
longitude,
rank() OVER (PARTITION BY user_id ORDER BY date DESC, id DESC)
AS datepos
FROM checkin_node
WHERE user_id = 43
) AS tmp_last_position
WHERE datepos = 1; -- takes 2 ms

-- track it down: reproduce limitation without a view:

SELECT user_id, latitude, longitude
FROM (
SELECT
user_id,
latitude,
longitude,
rank() OVER (PARTITION BY user_id ORDER BY date DESC, id DESC)
AS datepos
FROM checkin_node
) AS tmp_last_position
WHERE datepos = 1
AND user_id = 43; -- takes 6621 ms

-- without datepos condition

SELECT user_id, latitude, longitude
FROM (
SELECT
user_id,
latitude,
longitude,
rank() OVER (PARTITION BY user_id ORDER BY date DESC, id DESC)
AS datepos
FROM checkin_node
) AS tmp_last_position
WHERE user_id = 43; -- takes 6574 ms

Best regards,
Thomas

Am 03.01.2014 00:12, schrieb Thomas Mayer:
>
> Am 02.01.2014 23:43, schrieb Tom Lane:
>> Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
>>> On Thu, Jan 2, 2014 at 1:52 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>> It's possible that in the specific case you exhibit here, pushing down
>>>> the clause wouldn't result in changes in the window function's output for
>>>> the selected rows, but the optimizer doesn't have enough knowledge about
>>>> window functions to determine that.
>>
>>> A restriction in the WHERE clause which corresponds to the PARTITION BY
>>> should be pushable, no? I think it doesn't need to understand the internal
>>> semantics of the window function itself, just of the PARTITION BY, which
>>> should be doable, at least in principle.
>>
>> If the restriction clause must give the same answer for any two rows of
>> the same partition, then yeah, we could in principle push it down without
>> knowing anything about the specific window function. It'd be a less than
>> trivial test to make, I think. In any case, it's not a "bug" that the
>> optimizer doesn't do this currently.
>
> I agree, this is not a "bug" in v9.3.2 in terms of correctness.
>
> But it's a limitation, because the query plan is by far not optimal. You
> may consider this report as a feature request then.
>
> The optimization I suggested is normally performed, when no window
> function occurs in the statement.
>
> It seems like the optimizer is already capable of doing a check if the
> WHERE can be done first.
>
> However, this check seems to be done too conservative: I guess, the
> check is ignoring the PARTITION-BY-sets of attributes completely.
>
>>
>> regards, tom lane
>> .
>>
>
> Best regards
> Thomas
>
>

--
======================================
Thomas Mayer
Durlacher Allee 61
D-76131 Karlsruhe
Telefon: +49-721-2081661
Fax: +49-721-72380001
Mobil: +49-174-2152332
E-Mail: thomas(dot)mayer(at)student(dot)kit(dot)edu
=======================================

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Thomas Mayer 2014-01-03 04:37:31 Re: window function induces full table scan
Previous Message Tom Lane 2014-01-02 23:55:55 Re: window function induces full table scan