Quick Links

Re: rand48 replacement

From:	Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To:	Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Aleksander Alekseev <aleksander(at)timescale(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: rand48 replacement
Date:	2021-07-02 08:31:41
Message-ID:	CAEZATCVpnOjrwU=4712OPjNDqV2sVkkK7XHDiR-nzYDL8N6z=Q@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, 1 Jul 2021 at 22:18, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> wrote:
>
> > However, this patch seems to be replacing that with a simple
> > modulo operation, which is perhaps the worst possible way to do it.
>
> The modulo operation is biased for large ranges close to the limit, sure.
> Also, the bias is somehow of the same magnitude as the FP multiplication
> approach used previously, so the "worst" has not changed much, it is
> really the same as before.
>

It may be true that the bias is of the same magnitude as FP multiply,
but it is not of the same nature. With FP multiply, the
more-likely-to-be-chosen values are more-or-less evenly distributed
across the range, whereas modulo concentrates them all at one end,
making it more likely to bias test results.

It's worth paying attention to how other languages/libraries implement
this, and basically no one chooses the modulo method, which ought to
raise alarm bells. Of the biased methods, it has the worst kind of
bias and the worst performance.

If a biased method is OK, then the biased integer multiply method
seems to be the clear winner. This requires the high part of a
64x64-bit product, which is trivial if 128-bit integers are available,
but would need a little more work otherwise. There's some code in
common/d2s that might be suitable.

Most other implementations tend to use an unbiased method though, and
I think it's worth doing the same. It might be a bit slower, or even
faster depending on implementation and platform, but in the context of
the DB as a whole, I don't think a few extra cycles matters either
way. The method recommended at the very end of that blog seems to be
pretty good all round, but any of the other commonly used unbiased
methods would probably be OK too.

Regards,
Dean

In response to

Re: rand48 replacement at 2021-07-01 21:18:44 from Fabien COELHO

Responses

Re: rand48 replacement at 2021-07-02 21:51:42 from Fabien COELHO

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	David Rowley	2021-07-02 08:39:44	Re: Add proper planner support for ORDER BY / DISTINCT aggregates
Previous Message	Ronan Dunklau	2021-07-02 07:53:52	Re: Add proper planner support for ORDER BY / DISTINCT aggregates