From: | Scott Bailey <artacus(at)comcast(dot)net> |
---|---|
To: | |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: how to identify outliers |
Date: | 2009-10-27 23:04:47 |
Message-ID: | 4AE77C8F.1000900@comcast.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Rhys A.D. Stewart wrote:
> Im asking how to get the ones that dont fall near the avg.... so for
> example lets say i have the following distances:
> 10,11,12,11,10,9,9,10,11,12,10,11,99
>
> then 99 would be an outlier. the avg would be like 16 or 17 i reckon
> with the 99. so i want a way to find aan outlier, remove it and then
> recalcuate the avg...and then i'd get a 'better' avg.....
>
> i did some seraching about outliers and most of hits are about R or
> spss or some other statistical package.....so looking for a way to do
> it wholly in pgsql.
>
>
> Rhys
>
> On Tue, Oct 27, 2009 at 4:53 PM, Ben Chobot <bench(at)silentmedia(dot)com> wrote:
>> Are you asking how to find the average and standard deviation? Or how to
>> compare the your data against some set values? Perhaps an example would be
>> appropriate; it's not very clear to me what you're asking.
>>
>> Rhys A.D. Stewart wrote:
>>> Hey all,
>>> I have the following table: data(pnum text, distance float8, route text).
>>> I would like to remove the outliers in distance, i.e. lets say i get
>>> the avg dist of pnum for each route and the std deviation of the
>>> distance what is the best way to identify the outliers?
>>>
>>>
>>> Rhys.
Oh, so you want to "cook" your data? I don't agree with that
conceptually, but:
WITH base AS (
SELECT random(1, 100)::int AS i
FROM generate_series(1, 100) i
),
stats AS (
SELECT avg(i) AS dist_avg, stddev(i) AS dist_dev FROM base
)
SELECT count(i), avg(i) AS new_avg, MIN(stats.dist_avg) AS old_avg,
stddev(i) AS new_dev, MIN(stats.dist_dev) AS old_dev
FROM base, stats
WHERE base.i BETWEEN stats.dist_avg - dist_dev
AND stats.dist_avg + dist_dev
Scott Bailey
From | Date | Subject | |
---|---|---|---|
Next Message | Merlin Moncure | 2009-10-27 23:18:22 | Re: Slow running query with views...how to increase efficiency? with index? |
Previous Message | Alvaro Herrera | 2009-10-27 22:56:04 | Re: how to identify outliers |