Re: How to enumerate partitions from a window function?

From: Alban Hertroys <haramrae(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: How to enumerate partitions from a window function?
Date: 2020-09-03 14:35:48
Message-ID: CAF-3MvPAo6UoKr25kJMi6OF3Czf-UC8kLwhY7gtJCFbjpuzD2w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, 3 Sep 2020 at 16:01, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Alban Hertroys <haramrae(at)gmail(dot)com> writes:
> > As stated above, I want to enumerate the runs, starting at 1 and
> > incrementing by 1 every time a partition from the 'run' window closes,
> > Is there a way to achieve this through window functions, or do we need to
> > wrap the thing in a subquery to achieve this?
>
> I think this'll work:
>
> select datetime, property_A, property_B
> , first_value(datetime)::time over run as swap_time
> , dense_rank() over (order by property_A, property_B)
> , value
> from process_data
> window run as (partition by property_A, property_B order by datetime)
> ;
>
> You can't do it with a window function over the "run" window because
> no window function ever looks outside the current partition. But
> that's easy to fix by using a different window definition. The
> planner is smart enough to see that these windows are compatible
> and only need one sort to be performed.
>
> regards, tom lane
>

Thanks Tom,

That gets us close, but it ignores the order of the runs over time. I think
it also reassigns the same number to later runs at the same 'day' that
happen to have the same values for property_A and _B. That's some crucial
information that I forgot to include.

To expand on my original example:

datetime | property_A | property_B | swap_time | run_nr | value
================================================
2020-09-03 15:06 | tea | earl grey | 15:06 | 1 | 0.23
2020-09-03 15:07 | tea | earl grey | 15:06 | 1 | 0.22
2020-09-03 15:08 | tea | ceylon | 15:08 | 2 | 0.34
2020-09-03 15:09 | coffee | cappucino | 15:09 | 3 | 0.45
2020-09-03 15:10 | coffee | cappucino | 15:09 | 3 | 0.43
2020-09-03 15:11 | tea | earl grey | 15:11 | 4 | 0.23
etc.

Where the last row has the same characteristic properties as the first 2
rows (from run 1), but is in run 4 due to it having started after run 3.

The runs normally start at 1 hour before midnight, with run 1, and continue
24h from there (it's a shifted day-schedule). The above example starting at
15:06 is unlikely to occur in reality, although possible (with long
downtime). That's mostly to clarify how the run numbers should function, it
would require to at least partition run_nr by a date shifted 1 hour back,
as long as they number their runs correctly along the time axis.

Regards,
Alban.
--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Lawrence Layhee 2020-09-03 18:31:00 Dependency problem using community repo on Redhat 7
Previous Message Jehan-Guillaume de Rorthais 2020-09-03 14:25:10 Re: Tuchanka