Re: pgbench - add pseudo-random permutation function

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Hironobu SUZUKI <hironobu(at)interdb(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - add pseudo-random permutation function
Date: 2021-03-30 19:31:13
Message-ID: CAEZATCWMUEdoxkC-R9-Uq5q5N7oWbW_vFy8N1Ph3aBN9f=ORHQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 30 Mar 2021 at 19:26, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> wrote:
>
> First, I have a thing against erand48.

Yeah, that's probably a fair point. However, all the existing pgbench
random functions are using it, so I think it's fair enough for
permute() to do the same (and actually 2^48 is pretty huge). Switching
to a 64-bit PRNG might not be a bad idea, but I think that's something
we'd want to do across the board, and so I think it should be out of
scope for this patch.

> Second, I have a significant reservation about the very structure of the
> transformation in this version:
>
> loop 4 times :
>
> // FIRST HALF STEER
> m/r = pseudo randoms
> if v in first "half"
> v = ((v * m) ^ r) & mask;
> rotate1(v)
>
> // FULL SHIFT 1
> r = pseudo random
> v = (v + r) % size
>
> // SECOND HALF STEER
> m/r = pseudo randoms
> if v in second "half"
> same as previous on second half
>
> // FULL SHIFT 2
> r = pseudo random
> v = (v + r) % size
>
> I'm really at odds with FULL SHIFT 1, because it means that up to 1/256 of
> values are kept out of STEERING. Whole chunks of values could be kept
> unshuffled because they would only have SHIFTS apply to them and each time
> fall in the not steered half. It should be an essential part of the design
> that at least one steer is applied on a value at each round, and if two
> are applied then fine, but certainly not zero. So basically I think that
> the design would be significantly improved by removing "FULL SHIFT 1".

Ah, that's a good point. Something else that also concerned me there
was that it might lead to 2 consecutive full shifts with nothing in
between, which would lead to less uniform randomness (like the
Irwin-Hall distribution).

I just did a quick test without the first full shift, and the results
do appear to be better, so removing that looks like a good idea.

> Third, I think that the rotate code can be simplified, in particular the
> ?: should be avoided because it may induce branches quite damaging to
> processor performance.

Yeah, I wondered about that. Perhaps there's a "trick" that can be
used to simplify it. Pre-computing the number of bits in the mask
would probably help. I'll give it some thought.

Regards,
Dean

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joel Jacobson 2021-03-30 19:32:59 Re: Idea: Avoid JOINs by using path expressions to follow FKs
Previous Message Joel Jacobson 2021-03-30 19:08:07 Bug? pg_identify_object_as_address() et al doesn't work with pg_enum.oid