From: | Steve Baldwin <steve(dot)baldwin(at)gmail(dot)com> |
---|---|
To: | Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-general(at)lists(dot)postgresql(dot)org |
Subject: | Re: regr_slope returning NULL |
Date: | 2019-03-24 19:37:58 |
Message-ID: | CAKE1AiajxEyrEqJ0uiJdMpft0u06L6y-LPWn8Ds7R8nz42gD+g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Thanks Dean, that's really helpful. Because my x axis values are actually
derived from 'extract(epoch from tstz_col)', it is simple for me to
subtract an offset.
Cheers,
Steve
On Sun, Mar 24, 2019 at 7:55 PM Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
wrote:
> On Sun, 24 Mar 2019 at 08:01, Steve Baldwin <steve(dot)baldwin(at)gmail(dot)com>
> wrote:
> >
> > Thanks Tom,
> >
> > I've tried this on 11.2 (OS X 10.14.3, installed locally) and 10.6 (AWS
> RDS) instances with identical results. The values you show are identical
> to those returned by Oracle so that's great but why am I seeing different
> results?
> >
>
> This is caused by the large magnitude of the ts values, which causes a
> cancellation error in the Sxx calculation, which is what commit
> e954a727f0 fixed in HEAD, and will be available in PG12 [1].
>
> You can see that by including regr_sxx in the results. With PG11, this
> gives the following:
>
> select id, regr_slope(elapsed, ts) as trend, regr_sxx(elapsed, ts) as sxx
> from sb1 group by id;
>
> id | trend | sxx
> ------+----------------------+-------------
> c742 | | 0
> 317e | | 0
> 5fe6 | 5.78750952760444e-06 | 19905896448
> 3441 | | 0
> (4 rows)
>
> Those zeros for Sxx are the result of calculating the sum of the
> squares of ts values and then subtracting off the square of the mean,
> which results in a complete loss of accuracy because the intermediate
> values are so large they don't differ according to double precision
> arithmetic.
>
> A workaround in PG11 is to just offset the ts values by something
> close to their mean (offsetting the ts values by a constant amount
> shouldn't affect the mathematical result, but does eliminate the
> cancellation errors):
>
> select id, regr_slope(elapsed, ts-1552892914) as trend,
> regr_sxx(elapsed, ts-1552892914) as sxx
> from sb1 group by id;
>
> id | trend | sxx
> ------+----------------------+--------------------
> c742 | 19.6077357654714 | 0.0468182563781738
> 317e | -1.08385104429772 | 59.2381523980035
> 5fe6 | 5.78750948360697e-06 | 19905896596.7403
> 3441 | -3.82839508895523 | 20.1098628044128
> (4 rows)
>
>
> For PG12 the algorithm for calculating these quantities has been
> changed by e954a727f0, so the result should be more accurate
> regardless of the offset:
>
> select id, regr_slope(elapsed, ts) as trend, regr_sxx(elapsed, ts) as sxx
> from sb1 group by id;
>
> id | trend | sxx
> ------+----------------------+--------------------
> c742 | 19.6078587812905 | 0.0468179252929986
> 317e | -1.0838511987809 | 59.2381423694815
> 5fe6 | 5.78750948358674e-06 | 19905896596.7605
> 3441 | -3.82839546309736 | 20.1098619909822
> (4 rows)
>
> select id, regr_slope(elapsed, ts-1552892914) as trend,
> regr_sxx(elapsed, ts-1552892914) as sxx
> from sb1 group by id;
>
> id | trend | sxx
> ------+----------------------+--------------------
> c742 | 19.6078431374563 | 0.0468179999990382
> 317e | -1.08385109620679 | 59.2381495556381
> 5fe6 | 5.78750948360693e-06 | 19905896596.7403
> 3441 | -3.82839509931361 | 20.109862749992
> (4 rows)
>
> Regards,
> Dean
>
> [1] https://github.com/postgres/postgres/commit/e954a727f0
>
From | Date | Subject | |
---|---|---|---|
Next Message | Christian Henz | 2019-03-25 10:14:49 | Re: Forks of pgadmin3? |
Previous Message | Chris Travers | 2019-03-24 14:53:38 | Re: When to store data that could be derived |