Quick Links

Re: how to identify outliers

From:	Sam Mason <sam(at)samason(dot)me(dot)uk>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: how to identify outliers
Date:	2009-10-28 11:00:39
Message-ID:	20091028110039.GQ5407@samason.me.uk
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

> Rhys A.D. Stewart wrote:
> >I would like to remove the outliers in distance

As others have said; an "outlier" is normally a human call and not
something that's generally valid to do automatically. The operator
would probably want to go in and look to see why it's that far out and
either fix the typo or do whatever else it takes to "fix" the problem.

On Tue, Oct 27, 2009 at 04:58:23PM -0700, John R Pierce wrote:
> you could probably do something with a standard deviation that is more
> accurate for large sets than just tossing the 2 outliers.

I'd agree, stddev is probably best and the following should do something
reasonable for what the OP was asking:

SELECT d.*
FROM data d, (
SELECT avg(distance), stddev(distance) FROM data) x
WHERE abs(d.distance - x.avg) < x.stddev * 2;

--
Sam http://samason.me.uk/

In response to

Re: how to identify outliers at 2009-10-27 23:58:23 from John R Pierce

Responses

Re: how to identify outliers at 2009-10-28 12:34:02 from Chris Spotts

Browse pgsql-general by date

	From	Date	Subject
Next Message	Chris Spotts	2009-10-28 12:34:02	Re: how to identify outliers
Previous Message	Alban Hertroys	2009-10-28 10:57:59	Re: Slow running query with views...how to increase efficiency? with index?