From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: CPU costs of random_zipfian in pgbench |
Date: | 2019-02-17 22:08:31 |
Message-ID: | 779980b4-a4a6-3bf9-7ecf-56cc9ce6f5be@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2/17/19 5:09 PM, Tom Lane wrote:
> Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> writes:
>>> I'm trying to use random_zipfian() for benchmarking of skewed data sets,
>>> and I ran head-first into an issue with rather excessive CPU costs.
>
>> If you want skewed but not especially zipfian, use exponential which is
>> quite cheap. Also zipfian with a > 1.0 parameter does not have to compute
>> the harmonic number, so it depends in the parameter.
>
> Maybe we should drop support for parameter values < 1.0, then. The idea
> that pgbench is doing something so expensive as to require caching seems
> flat-out insane from here.
Maybe.
It's not quite clear to me why we support the two modes at all? We use
one algorithm for values < 1.0 and another one for values > 1.0, what's
the difference there? Are those distributions materially different?
Also, I wonder if just dropping support for parameters < 1.0 would be
enough, because the docs say:
The function's performance is poor for parameter values close and
above 1.0 and on a small range.
which seems to suggest it might be slow even for values > 1.0 in some
cases. Not sure.
> That cannot be seen as anything but a foot-gun
> for unwary users. Under what circumstances would an informed user use
> that random distribution rather than another far-cheaper-to-compute one?
>
>> ... This is why I submitted a pseudo-random permutation
>> function, which alas does not get much momentum from committers.
>
> TBH, I think pgbench is now much too complex; it does not need more
> features, especially not ones that need large caveats in the docs.
> (What exactly is the point of having zipfian at all?)
>
I wonder about the growing complexity of pgbench too ...
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | David Rowley | 2019-02-17 22:15:19 | Re: BUG #15572: Misleading message reported by "Drop function operation" on DB with functions having same name |
Previous Message | Tomas Vondra | 2019-02-17 22:02:37 | Re: CPU costs of random_zipfian in pgbench |