From: | Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> |
---|---|
To: | Madeleine Thompson <madeleineth(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: BUG #15307: Low numerical precision of (Co-) Variance |
Date: | 2018-09-27 13:22:39 |
Message-ID: | CAEZATCVgwHdXQPwihqjpFOit9_KuUaP9ZYdRb0HV4+-bwj1p0A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-hackers |
On 27 September 2018 at 06:12, Madeleine Thompson <madeleineth(at)gmail(dot)com> wrote:
> This is my first PostgreSQL review. Hopefully I'm getting it right. I independently ran into the bug in question and found that the author had already written a patch. I'm happy the author wrote this.
>
Thanks for the review. And yes, this sort of review is exactly what we need.
> (1) I am not experienced with PostgreSQL internals, so I can't comment on the coding style or usage of internal functions.
>
> (2) The patch applies cleanly to ce4887bd025b95c7b455fefd817a418844c6aad3. "make check", "make check-world", and "make install" pass.
>
> (3) The patch addresses the numeric inaccuracy reported in the bug. (Yay!) I believe the tests are sufficient to confirm that it works as intended. I don't think the test coverage *before* this patch was sufficient, and I appreciate the improved test coverage of the combine functions. I verified the new tests manually.
>
Excellent. Thanks for testing.
> (4) The comments cite Youngs & Cramer (1971). This is a dated citation. It justifies its algorithms based on pre-IEEE754 single-precision arithmetic. Most notably, it assumes that floating-point division is not exact. That said, the algorithm is still a good choice; it is justified now because compared to the standard Welford variance, it is less prone to causing pipeline stalls on a modern CPU. Maybe link to Schubert & Gentz 2018 (https://dl.acm.org/citation.cfm?id=3223036) instead. The new float8_combine combine code is (almost) S&G eqn. 22; the float8_accum code is S&G eqn. 28. float8_regr_accum and float8_regr_combine are S&G eqn. 22.
>
Hmm, I think that Youngs & Cramer should be cited as the original
inventors of the algorithm (which pre-dates the first version of
IEEE754 by a few years), although I'm happy to also cite Schubert &
Gentz as a more modern justification for the choice of algorithm.
Regarding the combine functions, I think perhaps Chan, Golub & LeVeque
[1] should be cited as the original inventors. I think they only did
variance, not covariance, but that's a straightforward generalisation
of their work.
[1] Updating Formulae and a Pairwise Algorithm for Computing Sample
Variances, T. F. Chan, G. H. Golub & R. J. LeVeque, COMPSTAT 1982.
> (5) I think the comment /* Watch out for roundoff error producing a negative variance */ and associated check are obsolete now.
>
Ah, good point. We clearly only ever add positive quantities to Sxx
and Syy, and likewise when they are combined, so there is no way that
they can ever become negative now. Another neat consequence of this
algorithm.
I'll post an updated patch in a while.
Regards,
Dean
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2018-09-27 13:47:19 | Re: BUG #15407: [minor] build depends on $MAKELEVEL being 0 at top Makefile |
Previous Message | PG Bug reporting form | 2018-09-27 11:59:23 | BUG #15408: Postgresql bad planning with multicolumn gist and search with empty results |
From | Date | Subject | |
---|---|---|---|
Next Message | Jesper Pedersen | 2018-09-27 13:59:50 | Re: Index Skip Scan |
Previous Message | Kyotaro HORIGUCHI | 2018-09-27 13:00:49 | Re: shared-memory based stats collector |