Re: Calculating Median value

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Chandru Aroor <caroor(at)yahoo(dot)com>
Cc: "pgsql-novice(at)lists(dot)postgresql(dot)org" <pgsql-novice(at)lists(dot)postgresql(dot)org>
Subject: Re: Calculating Median value
Date: 2018-05-06 18:13:47
Message-ID: 23810.1525630427@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

Chandru Aroor <caroor(at)yahoo(dot)com> writes:
> Thanks Tom.  You are actually correct. In talking with the business user, we need to get the 50th percentile, and picking the higher value for even number of members,  so median as Excel calculates it is actually not appropriate. Will aggregate percentile_cont, or possibly percentile_disc do it?

If you don't want to interpolate between the two middle values, you must
use percentile_disc not percentile_cont --- that's exactly what the
difference is between those two functions.

I think though that percentile_disc picks the first not the second of the
two middle values. You can fix that by reversing the sort order, so it'd
look like

SELECT percentile_disc(0.5) WITHIN GROUP (ORDER BY x DESC) FROM ...

Here's some examples;

regression=# SELECT percentile_disc(0.5) WITHIN GROUP (ORDER BY x) FROM (VALUES (1),(3),(7),(9)) v(x);
percentile_disc
-----------------
3
(1 row)

regression=# SELECT percentile_disc(0.5) WITHIN GROUP (ORDER BY x DESC) FROM (VALUES (1),(3),(7),(9)) v(x);
percentile_disc
-----------------
7
(1 row)

regression=# SELECT percentile_cont(0.5) WITHIN GROUP (ORDER BY x) FROM (VALUES (1),(3),(7),(9)) v(x);
percentile_cont
-----------------
5
(1 row)

regards, tom lane

In response to

Browse pgsql-novice by date

  From Date Subject
Next Message Amit S. 2018-05-06 23:04:25 Re: Postgres warm standby with delay
Previous Message Chandru Aroor 2018-05-06 17:06:23 Re: Calculating Median value