Re: rand48 replacement

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Aleksander Alekseev <aleksander(at)timescale(dot)com>
Cc: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: rand48 replacement
Date: 2021-11-26 18:26:39
Message-ID: 438984.1637951199@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Aleksander Alekseev <aleksander(at)timescale(dot)com> writes:
> It looks like the patch is in pretty good shape. I noticed that the
> return value of pg_prng_strong_seed() is not checked in several
> places, also there was a typo in pg_trgm.c. The corrected patch is
> attached. Assuming the new version will not upset cfbot, I would call
> the patch "Ready for Committer".

I took a quick look through this. The biggest substantive point
I found was that you didn't update the configure script. It's
certainly not appropriate for configure to continue to do
AC_REPLACE_FUNCS on random and srandom when you've removed the
src/port files that that would attempt to include.

The simplest change here is just to delete those entries from the
list, but that would also result in not #define'ing HAVE_RANDOM
or HAVE_SRANDOM, and I see that the patch introduces a dependency
on the latter. I'm inclined to think that's misguided. srandom()
has been required by POSIX since SUSv2, and we certainly have not
got any non-Windows buildfarm members that lack it. So I don't
think we really need a configure check. What we do need is a decision
about what to do on Windows. We could write it like

+#ifndef WIN32
+ srandom(pg_prng_i32(&pg_global_prng_state));
+#endif

but I have a different modest suggestion: add

#define srandom(seed) srand(seed)

in win32_port.h. As far as I can see from Microsoft's docs [1],
srand() is exactly like srandom(), they just had some compulsion
to not be POSIX-compatible.

BTW, the commentary in InitProcessGlobals is now completely
inadequate; it's unclear to a reader why we should be bothering
ith srandom(). I suggest adding a comment right before the
srandom() call, along the lines of

/*
* Also make sure that we've set a good seed for random() (or rand()
* on Windows). Use of those functions is deprecated in core
* Postgres, but they might get used by extensions.
*/

+/* use Donald Knuth's LCG constants for default state */

How did Knuth get into this? This algorithm is certainly not his,
so why are those constants at all relevant?

Other cosmetic/commentary issues:

* I could do without the stream-of-consciousness notes in pg_prng.c.
I think what's appropriate is to say "We use thus-and-such a generator
which is documented here", maybe with a line or two about its properties.

* Function names like these convey practically nothing to readers:

+extern int64 pg_prng_i64(pg_prng_state *state);
+extern uint32 pg_prng_u32(pg_prng_state *state);
+extern int32 pg_prng_i32(pg_prng_state *state);
+extern double pg_prng_f64(pg_prng_state *state);
+extern bool pg_prng_bool(pg_prng_state *state);

and these functions' header comments add a grand total of zero bits
of information. What someone generally wants to know first about
a PRNG is (a) is it uniform and (b) what is the range of outputs,
neither of which are specified anywhere.

+#define FIRST_BIT_MASK UINT64CONST(0x8000000000000000)
+#define RIGHT_HALF_MASK UINT64CONST(0x00000000FFFFFFFF)
+#define DMANTISSA_MASK UINT64CONST(0x000FFFFFFFFFFFFF)

I'm not sure that these named constants are any more readable than
writing the underlying constant, maybe less so --- in particular
I think something based on (1<<52)-1 would be more appropriate for
the float mantissa operations. We don't need RIGHT_HALF_MASK at
all, the casts to uint32 or int32 will accomplish that just fine.

BTW, why are we bothering with FIRST_BIT_MASK in the first place,
rather than returning "v & 1" for pg_prng_bool? Is xoroshiro128ss
less random in the low-order bits than the higher? If so, that would
be a pretty important thing to document. If it's not, we shouldn't
make the code look like it is.

+ * select in a range with bitmask rejection.

What is "bitmask rejection"? Is it actually important to callers?
I think this should be documented more like "Produce a random
integer uniformly selected from the range [rmin, rmax)."

regards, tom lane

[1] https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/srand?view=msvc-170

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2021-11-26 18:59:25 Re: Why not try for a HOT update, even when PageIsFull()?
Previous Message Andrew Dunstan 2021-11-26 18:23:09 Re: Windows build warnings