From: | David Fetter <david(at)fetter(dot)org> |
---|---|
To: | Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> |
Cc: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: multivariate statistics (v19) |
Date: | 2017-02-12 19:42:07 |
Message-ID: | 20170212194207.GA2120@fetter.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, Feb 12, 2017 at 10:35:04AM +0000, Dean Rasheed wrote:
> On 11 February 2017 at 01:17, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> > Thanks for the feedback, I'll fix this. I've allowed myself to be a bit
> > sloppy because the number of attributes in the statistics is currently
> > limited to 8, so the overflows are currently not an issue. But it doesn't
> > hurt to make it future-proof, in case we change that mostly artificial limit
> > sometime in the future.
> >
>
> Ah right, so it can't overflow at present, but it's neater to have an
> overflow-proof algorithm.
>
> Thinking about the exactness of the division steps is quite
> interesting. Actually, the order of the multiplying factors doesn't
> matter as long as the divisors are in increasing order. So in both my
> proposal:
>
> result = 1
> for (i = 1; i <= k; i++)
> result = (result * (n-k+i)) / i;
>
> and David's proposal, which is equivalent but has the multiplying
> factors in the opposite order, equivalent to:
>
> result = 1
> for (i = 1; i <= k; i++)
> result = (result * (n-i+1)) / i;
>
> the divisions are exact at each step. The first time through the loop
> it divides by 1 which is trivially exact. The second time it divides
> by 2, having multiplied by 2 consecutive factors, one of which is
> therefore guaranteed to be divisible by 2. The third time it divides
> by 3, having multiplied by 3 consecutive factors, one of which is
> therefore guaranteed to be divisible by 3, and so on.
Right. You know you can use integer division, which make sense as
permutations of discrete sets are always integers.
> My approach originally seemed more logical to me because of the way it
> derives from the recurrence relation binomial(n, k) = binomial(n-1,
> k-1) * n / k, but they both work fine as long as they have suitable
> overflow checks.
Right. We could even cache those checks (sorry) based on data type
limits by architecture and OS if performance on those operations ever
matters that much.
> It's also interesting that descriptions of this algorithm tend to
> talk about setting k to min(k, n-k) at the start as an optimisation
> step, as I did in fact, whereas it's actually more than that -- it
> helps prevent unnecessary intermediate overflows when k > n/2. Of
> course, that's not a worry for the current use of this function, but
> it's good to have a robust algorithm.
Indeed. :)
Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
From | Date | Subject | |
---|---|---|---|
Next Message | David Rowley | 2017-02-12 23:06:31 | Re: Improve OR conditions on joined columns (common star schema problem) |
Previous Message | Vladimir Rusinov | 2017-02-12 17:55:56 | Re: WIP: About CMake v2 |