From: | Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> |
---|---|
To: | Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Aleksander Alekseev <aleksander(at)timescale(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: rand48 replacement |
Date: | 2021-07-04 16:03:56 |
Message-ID: | alpine.DEB.2.22.394.2107041718570.2359988@pseudo |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> Now suppose we want a random number in the range [0,6). This is what
> happens with your algorithm for each of the possible prng() return
> values:
>
> prng() returns 0 -- OK
> prng() returns 1 -- OK
> prng() returns 2 -- OK
> prng() returns 3 -- OK
> prng() returns 4 -- OK
> prng() returns 5 -- OK
> prng() returns 6 -- out of range so use splitmix3() with initial state=6:
> state=6 -> 3, return 7 -- still out of range, so repeat
> state=3 -> 0, return 3 -- now OK
> prng() returns 7 -- out of range so use splitmix3() with initial state=7:
> state=7 -> 4, return 4 -- now OK
>
> So, assuming that prng() chooses each of the 8 possible values with
> equal probability (1/8), the overall result is that the values 0,1,2
> and 5 are returned with a probability of 1/8, whereas 3 and 4 are
> returned with a probability of 2/8.
Ok, I got that explanation.
> Using the correct implementation of the bitmask algorithm, each
> iteration calls prng() again, so in the end no particular return value
> is ever more likely than any other (hence it's unbiased).
Ok, you're taking into account the number of states of the PRNG, so this
finite number implies some bias on some values if you actually enumerate
all possible cases, as you do above.
I was reasoning "as if" the splitmix PRNG was an actual random function,
which is obviously false, but is also somehow a usual (false) assumption
with PRNGs, and with this false assumption my implementation is perfect:-)
The defect of the modulo method for range extraction is that even with an
actual (real) random generator the results would be biased. The bias is in
the method itself. Now you are arguing for a bias linked to the internals
of the PRNG. They are not the same in nature, even if the effect is the
same.
Also the bias is significant for close to the limit ranges, which is not
the kind of use case I have in mind when looking at pgbench.
If you consider the PRNG internals, then splitmix extraction function
could also be taken into account. If it is not invertible (I'm unsure),
then assuming it is some kind of hash function, about 1/e of output values
would not reachable, which is yet another bias that you could argue
against.
Using the initial PRNG works better only because the underlying 128 bits
state is much larger than the output value. Which is the point for having
a larger state in the first place, anyway.
> As for determinism, the end result is still fully deterministic. For
> example, lets say that prng() returns the following sequence, for some
> initial state:
>
> 1,0,3,0,3,7,4,7,6,6,5,3,7,7,7,0,3,6,5,2,3,3,4,0,0,2,7,4,...
>
> then the bitmask method just returns that sequence with all the 6's
> and 7's removed:
>
> 1,0,3,0,3,4,5,3,0,3,5,2,3,3,4,0,0,2,4,...
>
> and that same sequence will always be returned, when starting from
> that initial seed.
Yes and no.
The result is indeed deterministic of you call the function with the same
range. However, if you change the range value in one place then sometimes
the state can advance differently, so the determinism is lost, meaning
that it depends on actual range values.
Attached a v7 which does as you wish, but also looses the deterministic
non-value dependent property I was seeking. I would work around that by
deriving another 128 bit generator instead of splitmix 64 bit, but that is
overkill.
--
Fabien.
Attachment | Content-Type | Size |
---|---|---|
prng-7.patch | text/x-diff | 47.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Dean Rasheed | 2021-07-04 17:35:33 | Re: rand48 replacement |
Previous Message | Tom Lane | 2021-07-04 15:58:27 | Re: PostgreSQL-13.3 parser.y with positional references by named references |