From: | Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> |
---|---|
To: | Peter Geoghegan <pg(at)bowt(dot)ie> |
Cc: | Alik Khilazhev <a(dot)khilazhev(at)postgrespro(dot)ru>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [WIP] Zipfian distribution in pgbench |
Date: | 2017-08-05 07:49:46 |
Message-ID: | alpine.DEB.2.20.1708050930520.16395@lancre |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello Peter,
> I think that it would also be nice if there was an option to make
> functions like random_zipfian() actually return a value that has
> undergone perfect hashing. When this option is used, any given value
> that the function returns would actually be taken from a random mapping
> to some other value in the same range. So, you can potentially get a
> Zipfian distribution without the locality.
I definitely agree. This is a standard problem with all non uniform random
generators in pgbench, namely random_{gaussian,exponential}.
However hashing is not a good solution on a finite domain because of the
significant collision rate, so that typically 1/3 of values are empty and
collisions cause spikes. Also, collisions would break PKs.
The solution is to provide a (good) pseudo-random parametric permutation,
which is non trivial especially for non powers of two, so ISTM that it
should be a patch on its own.
The good news is that it is on my todo list and I have some ideas on how
to do it.
The bad news is that given the rate at which I succeed in getting things
committed in pgbench, it might take some years:-( For instance, simplistic
functions and operators to extend the current set have been in the pipe
for 15 months and missed pg10.
--
Fabien.
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2017-08-05 08:14:11 | Re: pg_stop_backup(wait_for_archive := true) on standby server |
Previous Message | Shay Rojansky | 2017-08-05 06:08:34 | Re: PostgreSQL not setting OpenSSL session id context? |