From: | Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Mitsumasa KONDO <kondo(dot)mitsumasa(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: gaussian distribution pgbench -- splits v4 |
Date: | 2014-08-01 07:26:53 |
Message-ID: | alpine.DEB.2.10.1408010905040.9457@sto |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello,
>> Version one is "k' = 1 + (a * k + b) modulo n" with "a" prime with
>> respect to "n", "n" being the number of keys. This is nearly possible,
>> but for the modulo operator which is currently missing, and that I'm
>> planning to submit for this very reason, but probably another time.
>
> That's pretty crude,
Yep. It is very simple, it is much better than nothing, and for a database
test is may be "good enough".
> although I don't object to a modulo operator. It would be nice to be
> able to use a truly random permutation, which is not hard to generate
> but probably requires O(n) storage, likely a problem for large scale
> factors.
That is indeed the actual issue in my mind. I was thinking of permutations
with a formula, which are not so easy to find and may end-up looking like
"(a*k+b)%n" anyway. I had the same issue for generating random data for a
schema (see http://www.coelho.net/datafiller.html)
> Maybe somebody who knows more math than I do (like you, probably!) can
> come up with something more clever.
I can certainly suggest other formula, but that does not mean beautiful
code, thus would probably be rejected. I'll see.
An alternative to this whole process may be to hash/modulo a non uniform
random value.
id = 1 + hash(some-random()) % n
But the hashing changes the distribution as it adds collisions, so I have
to think about how to be able to control the distribution in that case,
and what hash function to use.
--
Fabien.
From | Date | Subject | |
---|---|---|---|
Next Message | Mitsumasa KONDO | 2014-08-01 07:58:01 | Re: gaussian distribution pgbench -- splits v4 |
Previous Message | Jeff Davis | 2014-08-01 06:33:44 | numeric and float comparison oddities |