From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> |
Cc: | Paul Ramsey <pramsey(at)cleverelephant(dot)ca>, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, Michael Paquier <michael(at)paquier(dot)xyz>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: [PATCH] random_normal function |
Date: | 2023-01-09 23:38:39 |
Message-ID: | 67201.1673307519@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> writes:
> On Mon, 9 Jan 2023 at 18:52, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> (We could probably go further
>> than this, like trying to verify distribution properties. But
>> it's been too long since college statistics for me to want to
>> write such tests myself, and I'm not real sure we need them.)
> I played around with the Kolmogorov–Smirnov test, to test random() for
> uniformity. The following passes roughly 99.9% of the time:
Ah, cool, thanks for this code.
> With a one-in-a-thousand chance of failing, if you wanted something
> with around a one-in-a-billion chance of failing, you could just try
> it 3 times:
> SELECT ks_test_uniform_random() OR
> ks_test_uniform_random() OR
> ks_test_uniform_random();
> but it feels pretty hacky, and probably not really necessary.
That seems like a good way, because I'm not satisfied with
one-in-a-thousand odds if we want to remove the "ignore" marker.
It's still plenty fast enough: on my machine, the v2 patch below
takes about 19ms, versus 22ms for the script as it stands in HEAD.
> Rigorous tests for other distributions are harder, but also probably
> not necessary if we have confidence that the underlying PRNG is
> uniform.
Agreed.
>> BTW, if this does bring the probability of failure down to the
>> one-in-a-billion range, I think we could also nuke the whole
>> "ignore:" business, simplifying pg_regress and allowing the
>> random test to be run in parallel with others.
> I didn't check the one-in-a-billion claim, but +1 for that.
I realized that we do already run random in a parallel group;
the "ignore: random" line in parallel_schedule just marks it
as failure-ignorable, it doesn't schedule it. (The comment is a
bit misleading about this, but I want to remove that not rewrite it.)
Nonetheless, nuking the whole ignore-failures mechanism seems like
good cleanup to me.
Also, I tried this on some 32-bit big-endian hardware (NetBSD on macppc)
to verify my thesis that the results of random() are now machine
independent. That part works, but the random_normal() tests didn't;
I saw low-order-bit differences from the results on x86_64 Linux.
Presumably, that's because one or more of sqrt()/log()/sin() are
rounding off a bit differently there. v2 attached deals with this by
backing off to "extra_float_digits = 0" for that test. Once it hits the
buildfarm we might find we have to reduce extra_float_digits some more,
but that was enough to make NetBSD/macppc happy.
regards, tom lane
Attachment | Content-Type | Size |
---|---|---|
improve-random-tests-2.patch | text/x-diff | 11.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2023-01-09 23:38:58 | Re: POC: Lock updated tuples in tuple_update() and tuple_delete() |
Previous Message | Jelte Fennema | 2023-01-09 23:25:14 | Re: Allow +group in pg_ident.conf |