Quick Links

Re: Re: sampling.c and potential divisions by 0 ang log(0) with tablesample and ANALYZE in 9.5

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)postgresql(dot)org>
Subject:	Re: Re: sampling.c and potential divisions by 0 ang log(0) with tablesample and ANALYZE in 9.5
Date:	2015-07-01 07:56:29
Message-ID:	CAB7nPqT+sJE8x3q7kuMk7FrDSCs8ZThJUwVFf-a-WuhS__MeYQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

On Wed, Jul 1, 2015 at 1:17 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Petr Jelinek <petr(at)2ndquadrant(dot)com> writes:
>> On 2015-06-25 10:01, Michael Paquier wrote:
>>> I think that we should change the returned double to be (0.0,1.0]
>
>> Agreed.
>
> I find this to be a pretty bad idea. That definition is simply weird;
> where else in the world will you find a random number generator that does
> that? What are the odds that any callers are actually designed for that
> behavior?
>
> Another problem is that we consider anl_random_fract() to be an exported
> API, and the very longstanding definition of that is that the result is
> in (0,1), excluding both endpoints. Whatever we do with
> sampler_random_fract(), we'd better make sure that anl_random_fract()
> preserves that behavior, else we are likely to break third-party modules.

Wait a minute... Yes you are right I clearly missed the fact that in
~9.4 the range of values returned by anl_random_fract() does not
include 1. I thought it did, visibly I misread the code...

> A simple fix would be to adjust sampler_random_fract to disallow 0
> as result, say by repeating the pg_erand48 call if it produces 0.
> I'm not sure if that would throw off any of the math in the new
> tablesample-related callers. If it would, I'm inclined to fix the
> problem call-site-by-call-site, rather than inventing a definition
> of sampler_random_fract() that fails to satisfy the POLA.

Agreed. Disallowing 0 in sampler_random_fract looks like a good answer
to that. Looking at the tablesample code, for the bernouilli trial I
recall that the range of probability success and failure is actually
(0,1), (if p is a success rate, the failure is 1 - p). I am not sure
for Knuth Algo S though for the system sampling but that looks OK from
a pure logical viewpoint.
--
Michael

Attachment	Content-Type	Size
20150701_tablesample_random.patch	text/x-patch	542 bytes

In response to

Re: Re: sampling.c and potential divisions by 0 ang log(0) with tablesample and ANALYZE in 9.5 at 2015-06-30 16:17:15 from Tom Lane

Responses

Re: Re: sampling.c and potential divisions by 0 ang log(0) with tablesample and ANALYZE in 9.5 at 2015-07-02 00:44:10 from Michael Paquier

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	Michael Paquier	2015-07-02 00:44:10	Re: Re: sampling.c and potential divisions by 0 ang log(0) with tablesample and ANALYZE in 9.5
Previous Message	Tom Lane	2015-06-30 22:49:18	Re: BUG #13479: Doc contains dead link