From: | Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Definitional issue: stddev_pop (and related) for 1 input |
Date: | 2020-06-13 10:06:12 |
Message-ID: | CAEZATCWd_Z2RVpOQoMpRZBeAqx84yj0UXjtR6+ijdHPPBaz9mw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, 12 Jun 2020 at 20:53, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> I wrote:
> > Before v12, stddev_pop() had the following behavior with just a
> > single input value:
> > ...
> > As of v12, though, all three cases produce 0. I am not sure what
> > to think about that with respect to an infinity input, but I'm
> > quite sure I don't like it for NaN input.
>
> While I'm still not sure whether there's an academic argument that
> zero is a reasonable stddev value for a single input that is Inf,
> it seems to me that backwards compatibility is a sufficient reason
> for going back to producing NaN for that.
Yeah, it was an oversight, not considering that case. I think the
academic argument could equally well be made that the result should be
NaN if any input is Inf or NaN, even if there's only one input (it's
effectively "Inf - Inf" or "NaN - NaN"), so I agree that backwards
compatibility clinches it.
> Hence, attached are some proposed patches. 0001 just adds test
> cases demonstrating the current behavior; then 0002 makes the
> proposed code change. It's easy to check that the test case results
> after 0002 match what v11 produces.
Those both look reasonable to me.
> 0003 deals with a different problem which I noted in [1]: the numeric
> variants of var_samp and stddev_samp also do the wrong thing for a
> single special input. Their disease is that they produce NaN for a
> single NaN input, where it seems more sensible to produce NULL.
> At least, NULL is what we get for the same case with the float
> aggregates, so we have to change one or the other set of functions
> if we want consistency.
Hmm, yeah it's a bit annoying that they're different. NULL seems like
the more logical result -- sample standard deviation isn't defined for
a sample of 1, so why should it be different if that one value is NaN.
The patch looks reasonable, except I wonder if all compilers are smart
enough to realise that totCount is always initialised.
> I propose back-patching 0001/0002 as far as v12, since the failure
> to match the old outputs seems like a pretty clear bug/regression.
> However, I'd be content to apply 0003 only to HEAD. That misbehavior
> is very ancient, and the lack of complaints suggests that few people
> really care about this fine point.
Makes sense.
Regards,
Dean
From | Date | Subject | |
---|---|---|---|
Next Message | Fujii Masao | 2020-06-13 11:37:47 | Re: Resetting spilled txn statistics in pg_stat_replication |
Previous Message | Dong Wook Lee | 2020-06-13 09:34:46 | Re: Add tap test for --extra-float-digits option |