FYI: geometric means in one step without custom functions

From: Andrew Gould <andrewgould(at)yahoo(dot)com>
To: Postgres Mailing List <pgsql-general(at)postgresql(dot)org>
Subject: FYI: geometric means in one step without custom functions
Date: 2003-07-06 15:38:07
Message-ID: 20030706153807.85712.qmail@web13406.mail.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

A long time ago, I emailed this list about calculating
a geometric mean in PostgreSQL. Creating a custom
function didn't work because the process of
multiplying the values from each record resulted in
numbers that exceeded the limits for the size of a
number very quickly when dealing with large
populations.

I have learned, since, that you can achieve the same
end by replacing certain steps with log functions.
(Someone who is very good at math showed me this -- I
just tested the results and wrote the sql.) This
method has 2 great benefits:

1. The method pushes the limits of deriving geometric
mean calculations considerably.
2. The default installation of PostgreSQL has
everything needed to perform the calculation.

The sql statement below calculates the geometric mean
of the lengths of stay (gm_los) for patients, grouped
by diagnostic related group and fiscal year.

The population (cases) and average length of stay
(avg_los) are also reported.

Note 1. Make sure you are calculating geometric mean
on a data type that has values to the right of the
decimal point.

Note 2. You cannot use a log function on a value <= 0.
Thus, I filtered for los > 0.

select drg_no, fy, count(pt_id) as cases,
avg(los) as avg_los,
exp(sum(ln(los::real)) * (1.0/count(pt_id))) as gm_los

from case_current where los > 0
group by drg_no, fy;

Have fun!

Andrew Gould

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2003-07-06 15:44:42 Re: Sequence Roll Over
Previous Message Mat 2003-07-06 14:36:46 Sequence Roll Over