Re: reduce many loosely related rows down to one

From: Marc Mamin <M(dot)Mamin(at)intershop(dot)de>
To: "pgsql-sql(at)postgresql(dot)org" <pgsql-sql(at)postgresql(dot)org>
Cc: "'webmaster(at)dhs-club(dot)com'" <webmaster(at)dhs-club(dot)com>, "'Torsten Grust'" <torsten(dot)grust(at)gmail(dot)com>
Subject: Re: reduce many loosely related rows down to one
Date: 2013-05-29 15:07:07
Message-ID: B6F6FD62F2624C4C9916AC0175D56D880CDF7FCE@jenmbs01.ad.intershop.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

> SELECT id,
> (array_agg(rspid))[1] AS rspid, -- (1)

for such cases, I have created an new aggregate function:

SELECT firstnotnull(rspid) AS rspid,

this avoid to collect first all rspid values to then keep only the first one...

CREATE OR REPLACE FUNCTION public.first_agg_nn ( anyelement, anyelement )
RETURNS anyelement AS $$
SELECT $1;
$$ LANGUAGE SQL IMMUTABLE STRICT
COST 1;

CREATE AGGREGATE public.firstnotnull (
sfunc = public.first_agg_nn,
basetype = anyelement,
stype = anyelement
);

regards,

Marc Mamin

> -----Original Message-----
> From: pgsql-sql-owner(at)postgresql(dot)org [mailto:pgsql-sql-
> owner(at)postgresql(dot)org] On Behalf Of Torsten Grust
> Sent: Dienstag, 28. Mai 2013 17:08
> To: pgsql-sql(at)postgresql(dot)org
> Subject: Re: [SQL] reduce many loosely related rows down to one
>
> On 25 May 2013, at 9:19, Bill MacArthur wrote (with possible
> deletions):
> > [...]
> > select * from test;
> >
> > id | rspid | nspid | cid | iac | newp | oldp | ppv | tppv
> > ----+-------+-------+-----+-----+------+------+---------+---------
> > 1 | 2 | 3 | 4 | t | | | |
> > 1 | 2 | 3 | | | 100 | | |
> > 1 | 2 | 3 | | | | 200 | |
> > 1 | 2 | 3 | | | | | | 4100.00
> > 1 | 2 | 3 | | | | | | 3100.00
> > 1 | 2 | 3 | | | | | -100.00 |
> > 1 | 2 | 3 | | | | | 250.00 |
> > 2 | 7 | 8 | 4 | | | | |
> > (8 rows)
> >
> > -- I want this result (where ppv and tppv are summed and the other
> > distinct values are boiled down into one row)
> > -- I want to avoid writing explicit UNIONs that will break if, say
> the
> > "cid" was entered as a discreet row from the row containing "iac"
> > -- in this example "rspid" and "nspid" are always the same for a
> given
> > ID, however they could possibly be absent for a given row as well
> >
> > id | rspid | nspid | cid | iac | newp | oldp | ppv | tppv
> > ----+-------+-------+-----+-----+------+------+---------+---------
> > 1 | 2 | 3 | 4 | t | 100 | 200 | 150.00 | 7200.00
> > 2 | 7 | 8 | 4 | | | | 0.00 | 0.00
>
> One possible option could be
>
> SELECT id,
> (array_agg(rspid))[1] AS rspid, -- (1)
> (array_agg(nspid))[1] AS nspid,
> (array_agg(cid))[1] AS cid,
> bool_or(iac) AS iac, -- (2)
> max(newp) AS newp, -- (3)
> min(oldp) AS oldp, -- (4)
> coalesce(sum(ppv), 0) AS ppv,
> coalesce(sum(tppv),0) AS tppv
> FROM test
> GROUP BY id;
>
>
> This query computes the desired output for your example input.
>
> There's a caveat here: your description of the problem has been
> somewhat vague and it remains unclear how the query should respond if
> the functional dependency id -> rspid does not hold. In this case, the
> array_agg(rspid)[1] in the line marked (1) will pick one among many
> different(!) rspid values.
> I don't know your scenario well enough to judge whether this would be
> an acceptable behavior. Other possible behaviors have been implemented
> in the lines (2), (3), (4) where different aggregation functions are
> used to reduce sets to a single value (e.g., pick the largest/smallest
> of many values ...).
>
> Cheers,
> --Torsten
>
>
> --
> | Torsten "Teggy" Grust
> | Torsten(dot)Grust(at)gmail(dot)com
>
>
> --
> Sent via pgsql-sql mailing list (pgsql-sql(at)postgresql(dot)org) To make
> changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-sql

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Vick Khera 2013-05-29 15:51:03 Re: foreign key to multiple tables depending on another column's value
Previous Message Rodrigo Rosenfeld Rosas 2013-05-29 13:58:09 foreign key to multiple tables depending on another column's value