Custom aggregate last_value_when

From: Ben Tilly <btilly(at)gmail(dot)com>
To: pgsql-sql(at)lists(dot)postgresql(dot)org
Subject: Custom aggregate last_value_when
Date: 2020-12-18 21:03:30
Message-ID: CANoac9UAaT+DdMuug=+za5MmzrO_g3YdmyVhzWw3bAYbj1U8Vg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

At work I've found the following custom aggregate function very
helpful. But I would like to extend it and I'm not sure how to do
this.

The function is last_value_when. When you call last_value_when(value,
cond) over w it gives you the last value in the window where cond was
true.

Here is my implementation.

-----

CREATE OR REPLACE FUNCTION last_value_when_sfunc(state anyelement,
value anyelement, cond BOOL) RETURNS anyelement AS
$$
SELECT CASE WHEN cond THEN value ELSE state END;
$$
LANGUAGE SQL;

COMMENT ON FUNCTION .last_value_when_sfunc (anyelement, anyelement,
bool) IS 'Helper function for tracking last matching in window';

CREATE AGGREGATE last_value_when(anyelement, bool) (
SFUNC = expression.last_value_when_sfunc,
STYPE = anyelement);

COMMENT ON AGGREGATE last_value_when (anyelement, bool) IS 'Aggregate
function for tracking the last value when a condition was true';

-----

Here is an example of its use.

SELECT timestamp,
, measurement_type
, value
, timestamp
, last_value_when(timestamp, measurement_type = 'foo') over w as
last_foo_timestamp
, last_value_when(value, measurement_type = 'foo') over w as last_foo_value
FROM some_table
WINDOW w AS (
ORDER BY timestamp,
CASE WHEN measurement_type = 'foo' THEN 0 ELSE 1 END
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
)

This gives me for every row the last time/value for foo, no matter how
many rows back it might have been.

For working with time series I've found this incredibly powerful. In
fact I think that it is worth including in Postgres!

But here is my problem. I would like to build equivalents of lead/lag
with the same idea. That is where I currently do lag(some_column, 2)
I'd like to instead be able to say lag_when(some_column, 2,
some_condition). So I could pick out not just the last measurement of
foo, but the last 3 measurements of foo.

How would I do that?

Browse pgsql-sql by date

  From Date Subject
Next Message ml 2020-12-21 01:35:32 dblink_exists
Previous Message Anders Svensson 2020-12-03 09:52:56 Re: Join push down on FDW partitions