Re: How to enumerate partitions from a window function?

From: Alban Hertroys <haramrae(at)gmail(dot)com>
To: Michael Lewis <mlewis(at)entrata(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: How to enumerate partitions from a window function?
Date: 2020-09-04 08:41:16
Message-ID: CAF-3MvOBGiRLA=PO1mnZohuKd1Jz51JEKP8C5Qnq=JfdwadiJQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, 3 Sep 2020 at 20:59, Michael Lewis <mlewis(at)entrata(dot)com> wrote:

> It seems like you are maybe wanting this- If the previous row is the same,
> then get the previous row's run_nr. If it is different, then increment.
>
> case when lag( property_A ) over() = property_A and lag( property_B )
> over() = property_B then coalesce( lag( run_nr ) over(), 1 ) else lag(
> run_nr ) over() + 1 end
>
> Perhaps there is a much simpler implementation though.
>

That would work were it not that the very column we're defining is the one
to be aliased run_nr. The data does not contain that information, it's what
I'm trying to enrich it with and what I'm having trouble wrapping my head
around.
Your query (adopted a tiny bit) unfortunately results in:

select datetime, property_A, property_B
, first_value(datetime::time) over run_win as swap_time
, case
when lag(property_A) over time_win = property_A
and lag(property_B) over time_win = property_B
then coalesce(lag(run_nr) over time_win, 1)
else lag(run_nr) over time_win +1
end
, value
from process_data
window
time_win as (order by datetime)
, run_win as (partition by property_A, property_B order by datetime)
order by datetime
;

ERROR: column "run_nr" does not exist
LINE 6: then coalesce(lag(run_nr) over time_win, 1)
^
SQL state: 42703
Character: 221

I turned my example into a proper test-case (better late than never):

CREATE TABLE process_data (
datetime timestamp without time zone NOT NULL,
property_a text NOT NULL,
property_b text NOT NULL,
value numeric(12,3)
);

COPY process_data (datetime, property_a, property_b, value) FROM stdin;
2020-09-03 15:06:00 tea earl grey 0.230
2020-09-03 15:07:00 tea earl grey 0.220
2020-09-03 15:08:00 tea ceylon 0.340
2020-09-03 15:09:00 coffee cappucino 0.450
2020-09-03 15:10:00 coffee cappucino 0.430
2020-09-03 15:11:00 tea earl grey 0.230
\.

With the desired result (note that swap_time and run_nr are calculated
columns):
datetime | property_a | property_b | swap_time | run_nr | value
---------------------+------------+------------+-----------+--------+-------
2020-09-03 15:06:00 | tea | earl grey | 15:06:00 | 1 | 0.230
2020-09-03 15:07:00 | tea | earl grey | 15:06:00 | 1 | 0.220
2020-09-03 15:08:00 | tea | ceylon | 15:08:00 | 2 | 0.340
2020-09-03 15:09:00 | coffee | cappucino | 15:09:00 | 3 | 0.450
2020-09-03 15:10:00 | coffee | cappucino | 15:09:00 | 3 | 0.430
2020-09-03 15:11:00 | tea | earl grey | 15:06:00 | 4 | 0.230
(6 rows)

I've been looking around on the Internet in the meantime, and it seems
people either solve this with a recursive CTE (referencing the previous row
by row_number() over (...)) or by writing a set-returning function that
walks over the data in datetime order using a cursor.

Since the actual query is growing more and more state-tracking flags, using
a function has the added benefit that referencing state columns from the
previous row gets a lot easier (lots of repeated window functions
otherwise). It would become a procedural solution instead of a set-based
one, but considering that this data is order-sensitive (on datetime),
that's probably what a set-based solution would also end up doing anyway.

Regards,
Alban.

--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message mohand oubelkacem makhoukhene 2020-09-04 10:45:21 Implement a new data type
Previous Message Yang, Rong 2020-09-04 08:20:56 how to get top plan of GatherMerge in OSS10