From: | Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Georgios Kokolatos <gkokolatos(at)pm(dot)me>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: CPU costs of random_zipfian in pgbench |
Date: | 2019-03-24 20:21:45 |
Message-ID: | alpine.DEB.2.21.1903242006310.9939@lancre |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello Tom,
>>> If this is done, some people with zipfian distribution that currently
>>> work might be unhappy.
>
>> After giving it some thought, I think that this cannot be fully fixed for
>> 12.
>
> Just to clarify --- my complaint about "over engineering" referred to
> the fact that a cache exists at all; fooling around with the overflow
> behavior isn't really going to answer that.
Having some cache really makes sense for s < 1, AFAICS.
> The bigger picture here is that a benchmarking tool that contains its
> own performance surprises is not a nice thing to have.
Hmmm. It seems that it depends. Sometimes self inflected wounds are the
soldier's responsability, sometimes they must be prevented.
> It's not very hard to imagine somebody wasting a lot of time trying to
> explain weird results, only to find out that the cause is unstable
> performance of random_zipfian. Or worse, people might draw totally
> incorrect conclusions if they fail to drill down enough to realize that
> there's a problem in pgbench rather than on the server side.
Yep, benchmarking is tougher than it looks: it is very easy to get it
wrong without knowing, whatever tool you use.
>> Given the constraint of Jim Gray's approximated method for s in (0, 1),
>> which really does zipfian for the first two integers and then uses an
>> exponential approximation, the only approach is that the parameters must
>> be computed in a partial eval preparation phase before the bench code is
>> run. This means that only (mostly) constants would be allowed as
>> parameters when s is in (0, 1), but I think that this is acceptable
>> because anyway the method fundamentaly requires it.
>
> Yeah, if we could store all the required harmonic numbers before the
> test run timing starts, that would address the concern about stable
> performance. But I have to wonder whether zipfian with s < 1 is useful
> enough to justify so much code.
I do not know about that. I do not think that Jim Gray chose to invent an
approximated method for s < 1 because he thought it was fun. I think he
did it because his benchmark data required it. If you need it, you need
it. If you do not need it, you do not care.
>> The attached other attached patch illustrate what I call poor performance
>> for stupid parameters (no point in doing zipfian on 2 integers…) :
>> ...
>> Maybe the implementation could impose that s is at least 1.001 to avoid
>> the lower performance?
>
> I was wondering about that too. It seems like it'd be a wise idea to
> further constrain s and/or n to ensure that the s > 1 code path doesn't do
> anything too awful ...
Yep. The attached version enforces s >= 1.001, which avoids the worse cost
of iterating, according to my small tests.
> unless someone comes up with a better implementation that has stable
> performance without such constraints.
Hmmm…
--
Fabien.
Attachment | Content-Type | Size |
---|---|---|
pgbench-remove-zipf-below-one-2.patch | text/x-diff | 11.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Rahila Syed | 2019-03-24 20:42:28 | Re: monitoring CREATE INDEX [CONCURRENTLY] |
Previous Message | Tom Lane | 2019-03-24 18:27:46 | Re: CPU costs of random_zipfian in pgbench |