Re: Create custom aggregate function and custom sfunc

From: nha <lyondif02(at)free(dot)fr>
To: Jasmin Dizdarevic <Jasmin(dot)Dizdarevic(at)gmail(dot)com>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: Create custom aggregate function and custom sfunc
Date: 2009-07-03 03:48:43
Message-ID: 4A4D7F9B.2020107@free.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Hello,

Le 2/07/09 23:21, Greg Stark a écrit :
> On Thu, Jul 2, 2009 at 3:48 PM, Jasmin
> Dizdarevic<jasmin(dot)dizdarevic(at)gmail(dot)com> wrote:
>> customer ; seg
>> 111 ; L1
>> 111 ; L2
>> 111 ; L1
>> 222 ; L3
>> 222 ; L3
>> 222 ; L2
>>
>> the result should look like this:
>>
>> 111: L1 - because L1 is higher than L2 and ratio of L1 : L2 is 2 : 1
>> 222: L3 - because L3 is higher than L2 and ratio of L3 : L2 is 2 : 1
>
> Are you just looking for the most frequent seg for each customer?
>
> select distinct on (customer) customer,seg
> from (select customer, seg, count(*) as n from tab group by seg)
> order by customer, n desc
>
> That doesn't give the ratios though.
>

Some errors would occur with the above query from within the subquery
because of a projection on customer column that is not a grouped column
(neither an uniquely identified column by seg--seg is assumed not to be
a primary key here according to the given examples of value). By the
way, the ordered column n is not a member of the projected columns.

Nevertheless, in the same direction as Greg Stark, the following query
would approach the target result:

SELECT T5A.customer, T5A.seg
FROM (
SELECT T2.customer, T2.seg, COUNT(*) AS nb
FROM cst T2
GROUP BY T2.customer, T2.seg
) T5A INNER JOIN (
SELECT T4.customer, MAX(T4.nb) AS maxNb
FROM (
SELECT T2B.customer, T2B.seg, COUNT(*) AS nb
FROM cst T2B
GROUP BY T2B.customer, T2B.seg
) T4
GROUP BY T4.customer
) T5B
ON T5A.customer = T5B.customer
AND T5A.nb = T5B.maxNb

where 'cst' is the reference table including 'customer' and 'seg'
columns. This query considers a join between two tables:
- the first table, aliased T5A, counts the multiplicity for each couple
of 'customer' and 'seg';
- the secund table, aliased T5B, reveals the highest multiplicity for
each 'customer' with regard to each 'seg' attached to customer.
Then the join only retains the couples of 'customer' and 'seg' whom
multiplicity equals the higher for the 'customer' of the current couple.

Each multiplicity may be easily added to the resulting records by
spanning the projected columns with T5A.nb column. A sort of ratio may
also be added; one aggregated column may first be added to table T4 as
SUM(T4.nb) -- that is, sum of multiplicity for each customer; then the
resulting join may compute the ratio between the highest multiplicity
and the sum of multiplicity for each customer, in addition to the seg
(or list of seg) for which the multiplicity is the highest.

Here is the modified query:

SELECT T5A.customer, T5A.seg, T5A.nb/T5B.sumNb
FROM (
SELECT T2.customer, T2.seg, COUNT(*) AS nb
FROM cst T2
GROUP BY T2.customer, T2.seg
) T5A INNER JOIN (
SELECT T4.customer, MAX(T4.nb) AS maxNb, SUM(T4.nb) AS sumNb
FROM (
SELECT T2B.customer, T2B.seg, COUNT(*) AS nb
FROM cst T2B
GROUP BY T2B.customer, T2B.seg
) T4
GROUP BY T4.customer
) T5B
ON T5A.customer = T5B.customer
AND T5A.nb = T5B.maxNb

As is, this query may result to multiple couples of customer and seg
with the same customer value when many seg relie as many times as many
others for the current customer. This "edge effect" may be avoided in
many ways depending on the original purpose.

Regards.

--
nha / Lyon / France.

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Jasmin Dizdarevic 2009-07-03 07:14:44 Re: Create custom aggregate function and custom sfunc
Previous Message Greg Stark 2009-07-02 21:21:01 Re: Create custom aggregate function and custom sfunc