From: | Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> |
---|---|
To: | Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> |
Cc: | Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Aleksander Alekseev <aleksander(at)timescale(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: rand48 replacement |
Date: | 2021-07-07 03:00:47 |
Message-ID: | 8904daa4a916b4990b11d7140e0ff81f@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Fabien COELHO писал 2021-07-06 23:49:
> Hello Yura,
>
>>> However, I'm not enthousiastic at combining two methods depending on
>>> the range, the function looks complex enough without that, so I would
>>> suggest not to take this option. Also, the decision process adds to
>>> the average cost, which is undesirable.
>>
>> Given 99.99% cases will be in the likely case, branch predictor should
>> eliminate decision cost.
>
> Hmmm. ISTM that a branch predictor should predict that unknown < small
> should probably be false, so a hint should be given that it is really
> true.
Why? Branch predictor is history based: if it were usually true here
then it will be true this time either.
unknown < small is usually true therefore branch predictor will assume
it is true.
I put `likely` for compiler: compiler then puts `likely` path closer.
>
>> And as Dean Rasheed correctly mentioned, mask method will have really
>> bad pattern for branch predictor if range is not just below or equal
>> to power of two.
>
> On average the bitmask is the better unbiased method, if the online
> figures are to be trusted. Also, as already said, I do not really want
> to add code complexity, especially to get lower average performance,
> and especially with code like "threshold = - range % range", where
> both variables are unsigned, I have a headache just looking at it:-)
If you mention https://www.pcg-random.org/posts/bounded-rands.html then
1. first graphs are made with not exact Lemire's code.
Last code from
https://lemire.me/blog/2016/06/30/fast-random-shuffling/
(which I derived from) performs modulo operation only if `(leftover <
range)`.
Even with `rand_range(1000000)` it is just once in four thousands
runs.
2. You can see "Small-Constant Benchmark" at that page, Debiased Int is
1.5 times faster. And even in "Small-Shuffle" benchmark their
unoptimized
version is on-par with mask method.
3. If you go to "Making Improvements/Faster Threshold-Based Discarding"
section, then you'll see code my version is matched with. It is twice
faster than masked method in Small-Shuffle benchmark, and just a bit
slower in Large-Shuffle.
>
>> And __builtin_clzl is not free lunch either, it has latency 3-4 cycles
>> on modern processor.
>
> Well, % is not cheap either.
>
>> Well, probably it could run in parallel with some part of xoroshiro,
>> but it depends on how compiler will optimize this function.
>>
>>> I would certainly select the unbias multiply method if we want a u32
>>> range variant.
>>
>> There could be two functions.
>
> Yep, but do we need them? Who is likely to want 32 bits pseudo random
> ints in a range? pgbench needs 64 bits.
>
> So I'm still inclined to just keep the faster-on-average bitmask
> method, despite that it may be slower for some ranges. The average
> cost for the worst case in PRNG calls is, if I'm not mistaken:
>
> 1 * 0.5 + 2 * 0.25 + 3 * 0.125 + ... ~ 2
>
> which does not look too bad.
You doesn't count cost of branch-misprediction.
https://stackoverflow.com/questions/11227809/why-is-processing-a-sorted-array-faster-than-processing-an-unsorted-array
https://lemire.me/blog/2019/10/15/mispredicted-branches-can-multiply-your-running-times/
Therefore calculation should be at least:
1 * 0.5 + 0.5 * (3 + 0.5 * (3 + ...)) = 6
By the way, we have 64bit random. If we use 44bit from it for range <=
(1<<20), then
bias will be less than 1/(2**24). Could we just ignore it (given it is
not crypto
strong random)?
uint64 pg_prng_u64_range(pg_prng_state *state, uint64 range)
{
uint64 val = xoroshiro128ss(state);
uint64 m;
if ((range & (range-1) == 0) /* handle all power 2 cases */
return range != 0 ? val & (range-1) : 0;
if (likely(range < (1<<20)))
/*
* While multiply method is biased, bias will be smaller than
1/(1<<24) for
* such small ranges. Lets ignore it.
*/
return ((val >> 20) * range) >> 44;
/* Apple's mask method */
m = mask_u64(range-1);
val &= m;
while (val >= range)
val = xoroshiro128ss(state) & m;
return val;
}
Anyway, excuse me for heating this discussion cause of such
non-essential issue.
I'll try to control myself and don't proceed it further.
regards,
Sokolov Yura.
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2021-07-07 03:04:53 | Re: Refactor "mutually exclusive options" error reporting code in parse_subscription_options |
Previous Message | David Rowley | 2021-07-07 02:47:22 | Re: [PATCH] expand the units that pg_size_pretty supports on output |