From: | Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> |
---|---|
To: | Alik Khilazhev <a(dot)khilazhev(at)postgrespro(dot)ru> |
Cc: | PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [WIP] Zipfian distribution in pgbench |
Date: | 2017-08-05 08:15:05 |
Message-ID: | alpine.DEB.2.20.1708050950220.16395@lancre |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello Alik,
>> So I would be in favor of expanding the documentation but not
>> restricting the parameter beyond avoiding value 1.0.
>
> I have removed restriction and expanded documentation in attaching patch v5.
I've done some math investigations, which consisted in spending one hour
with Christian, a statistician colleague of mine. He took an old book out
of a shelf, opened it to page 550 (roughly in the middle), and explained
to me how to build a real zipfian distribution random generator.
The iterative method is for parameter a>1 and works for unbounded values.
It is simple to add a bound. In practice the iterative method is quite
effective, i.e. number of iterations is typically small, at least if the
bound is large and if parameter a is not too close to 1.
I've attached a python3 script which implements the algorithm. It looks
like magic. Beware that a C implementation should take care of float and
int overflows.
# usage: a, #values, #tests
sh> zipf.py 1.5 1000 1000000
# after 1.7 seconds
c = [391586, 138668, 75525, 49339, 35222, 26621, ...
... 11, 13, 12, 11, 16] (1.338591 iterations per draw)
sh> zipf.py 1.1 1000 1000000
# after 3.1 seconds
c = [179302, 83927, 53104, 39015, 30557, 25164, ...
... 82, 95, 93, 81, 80] (2.681451 iterations per draw)
I think that this method should be used for a>1, and the other very rough
one can be kept for parameter a in [0, 1), a case which does not make much
sense to a mathematician as it diverges if unbounded.
--
Fabien.
Attachment | Content-Type | Size |
---|---|---|
zipf.py | text/x-python | 1.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Fabien COELHO | 2017-08-05 09:03:34 | Re: Row Level Security Documentation |
Previous Message | Michael Paquier | 2017-08-05 08:14:11 | Re: pg_stop_backup(wait_for_archive := true) on standby server |