Re: select from grouped data

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Roger Mason <rmason(at)mun(dot)ca>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: pgsql-novice <pgsql-novice(at)postgresql(dot)org>
Subject: Re: select from grouped data
Date: 2021-08-25 20:42:56
Message-ID: 20210825204256.GX17906@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

Greetings,

* Roger Mason (rmason(at)mun(dot)ca) wrote:
> I have written this function to process some multiline text data into a
> table with both text & numeric columns:
>
> CREATE OR REPLACE FUNCTION get_final_energy ()
> RETURNS TABLE (
> jid text,
> "timestamp" text,
> scf integer,
> energy double precision
> )
> AS $$
> WITH a AS (
> SELECT
> jid,
> regexp_split_to_table(totenergy_out, '\n') AS teo
> FROM
> results
> ),
> b AS (
> SELECT
> results.jid,
> results. "timestamp",
> cast(
> CASE WHEN split_part(a.teo, ' ', 2) = '' THEN
> '0'
> ELSE
> split_part(a.teo, ' ', 2)
> END AS integer) AS scf,
> cast(
> CASE WHEN split_part(a.teo, ' ', 3) = '' THEN
> '0.0'
> ELSE
> split_part(a.teo, ' ', 3)
> END AS double precision) AS energy
> FROM
> results,
> a
> WHERE
> results.jid = a.jid
> GROUP BY
> results.jid,
> results. "timestamp",
> a.teo
> --HAVING
> -- scf = max(scf)
> ORDER BY
> timestamp ASC,
> scf DESC
> )
> SELECT
> *
> FROM
> b;
>
> $$
> LANGUAGE sql;
>
> The output looks like:
>
> jid | timestamp | scf | energy
> ------------+-----------------+-----+----------------
> 1250_1 | 20210805-114634 | 18 | -1316.43700819
> 1250_1 | 20210805-114634 | 17 | -1316.43700825
> 1250_1 | 20210805-114634 | 16 | -1316.4370097
> 1250_1 | 20210805-114634 | 15 | -1316.43700991
> 1250_1 | 20210805-114634 | 14 | -1316.43699775
> 1250_1 | 20210805-114634 | 13 | -1316.43699117
> 1250_1 | 20210805-114634 | 12 | -1316.43750771
> 1250_1 | 20210805-114634 | 11 | -1316.43805358
> 1250_1 | 20210805-114634 | 10 | -1316.43857192
> 1250_1 | 20210805-114634 | 9 | -1316.43070942
> 1251_1 | 20210806-062539 | 18 | -1316.43700819
> 1251_1 | 20210806-062539 | 17 | -1316.43700826
> ....
>
> What I want is to get (for each group) the energy corresponding to the
> maximum value of scf.

* David G. Johnston (david(dot)g(dot)johnston(at)gmail(dot)com) wrote:
> >
> >
> > The output looks like:
> >
> > jid | timestamp | scf | energy
> > ------------+-----------------+-----+----------------
> > 1250_1 | 20210805-114634 | 18 | -1316.43700819
> > 1250_1 | 20210805-114634 | 17 | -1316.43700825
> > 1250_1 | 20210805-114634 | 16 | -1316.4370097
> > 1250_1 | 20210805-114634 | 15 | -1316.43700991
> > 1250_1 | 20210805-114634 | 14 | -1316.43699775
> > 1250_1 | 20210805-114634 | 13 | -1316.43699117
> > 1250_1 | 20210805-114634 | 12 | -1316.43750771
> > 1250_1 | 20210805-114634 | 11 | -1316.43805358
> > 1250_1 | 20210805-114634 | 10 | -1316.43857192
> > 1250_1 | 20210805-114634 | 9 | -1316.43070942
> > 1251_1 | 20210806-062539 | 18 | -1316.43700819
> > 1251_1 | 20210806-062539 | 17 | -1316.43700826
> > ....
> >
> > What I want is to get (for each group) the energy corresponding to the
> > maximum value of scf.
> >
> >
> SELECT DISTINCT ON (jid) jid, timestamp, scf, energy [...] ORDER BY jid,
> scf DESC

While this works, it's generally better to use a LATERAL join as that's
part of the SQL standard while DISTINCT ON isn't. Using a LATERAL join
also would allow you to have multiple rows (top-N) if you wanted. You'd
do that using:

WITH jids AS (SELECT jid FROM results GROUP BY jid)
SELECT jids.jid, t.ts, t.scf, t.energy
FROM jids CROSS JOIN LATERAL
(SELECT ts, scf, energy FROM results
WHERE results.jid = jids.jid
ORDER BY scf DESC
LIMIT 1) AS t
;

A couple of notes on this: don't name a column "timestamp" and when it's
a timestamp, use the 'timestamptz' data type, not text. Your jids sure
look like they should just be numbers instead of text too. If you
already have a distinct set of jids somewhere (like in another table),
you could use that as the source table instead of the CTE that I'm using
above.

Thanks,

Stephen

In response to

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message James Keener 2021-08-25 21:37:42 Re: select from grouped data
Previous Message Roger Mason 2021-08-25 13:59:10 Re: select from grouped data