From: | Mitsumasa KONDO <kondo(dot)mitsumasa(at)gmail(dot)com> |
---|---|
To: | Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: gaussian distribution pgbench |
Date: | 2014-07-18 07:16:37 |
Message-ID: | CADupcHXwhX8ab6jjCVp4up8jJjpPxS8W2xedcqan1nx+yUhT1g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
2014-07-18 5:13 GMT+09:00 Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>:
>
> However, ISTM that it is not the purpose of pgbench documentation to be a
>>> primer about what is an exponential or gaussian distribution, so the idea
>>> would yet be to have a relatively compact explanation, and that the
>>> interested but clueless reader would document h..self from wikipedia or a
>>> text book or a friend or a math teacher (who could be a friend as
>>> well:-).
>>>
>>
>> Well, I think it's a balance. I agree that the pgbench documentation
>> shouldn't try to substitute for a text book or a math teacher, but I
>> also think that you shouldn't necessarily need to refer to a text book
>> or a math teacher in order to figure out how to use pgbench. Saying
>> "it's complicated, so we don't have to explain it" would be a cop out;
>> we need to *make* it simple. And if there's no way to do that, then
>> IMHO we should reject the patch in favor of some future patch that
>> implements something that will be easy for users to understand.
>>
>> [nttcom(at)localhost postgresql]$ contrib/pgbench/pgbench --exponential=10
>>>>> starting vacuum...end.
>>>>> transaction type: Exponential distribution TPC-B (sort of)
>>>>> scaling factor: 1
>>>>> exponential threshold: 10.00000
>>>>>
>>>>> decile percents: 63.2% 23.3% 8.6% 3.1% 1.2% 0.4% 0.2% 0.1% 0.0% 0.0%
>>>>> highest/lowest percent of the range: 9.5% 0.0%
>>>>>
>>>>
>>>> I don't have a clue what that means. None.
>>>>
>>>
>>> Maybe we could add in front of the decile/percent
>>>
>>> "distribution of increasing account key values selected by pgbench:"
>>>
>>
>> I still wouldn't know what that meant. And it misses the point
>> anyway: if the documentation is good, this will be unnecessary. If
>> the documentation is bad, a printout that tries to illustrate it by
>> example is not an acceptable substitute.
>>
>
> The decile description is quite classic when discussing statistics.
Yeah, maybe, I and Fabien-san don't believe that he doesn't know the decile
percentage.
However, I think more description about decile is needed.
For example, when we set the number of transaction 10,000 (-t 10000),
range of aid is 100,000,
and --exponential is 10, decile percents is under following as you know.
decile percents: 63.2% 23.3% 8.6% 3.1% 1.2% 0.4% 0.2% 0.1% 0.0% 0.0%
highest/lowest percent of the range: 9.5% 0.0%
They mean that,
#number of access in range of aid (from decile percents):
1 to 10,000 => 6,320 times
10,001 to 20,000 => 2,330 times
20,001 to 30,000 => 860 times
...
90,001 to 10,0000 => 0 times
#number of access in range of aid (from highest/lowest percent of the
range):
1 to 1,000 => 950 times
...
99,001 to 10,0000 => 0 times
that's all.
Their information is easy to understand distribution of access probability,
isn't it?
Maybe I and Fabien-san have a knowledge of mathematics, so we think decile
percentage is common sense.
But if it isn't common sense, I agree with adding about these explanation
in the documents.
Best regards,
--
Mitsumasa KONDO
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2014-07-18 07:47:08 | Re: Doing better at HINTing an appropriate column within errorMissingColumn() |
Previous Message | Tom Lane | 2014-07-18 06:34:55 | Re: Portability issues in TAP tests |