Re: gaussian distribution pgbench

From: Mitsumasa KONDO <kondo(dot)mitsumasa(at)gmail(dot)com>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, KONDO Mitsumasa <kondo(dot)mitsumasa(at)lab(dot)ntt(dot)co(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gaussian distribution pgbench
Date: 2014-03-15 08:56:42
Message-ID: CADupcHUixdyp7R8LQz9Rf6CU2Mi0GFiyeywXu=XHOEWqh4VPxA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Oh, sorry, I forgot to write URL referring picture.

http://en.wikipedia.org/wiki/Normal_distribution
http://en.wikipedia.org/wiki/Exponential_distribution

regards,
--
Mitsumasa KONDO

2014-03-15 17:50 GMT+09:00 Mitsumasa KONDO <kondo(dot)mitsumasa(at)gmail(dot)com>:

> Hi
>
> 2014-03-15 15:53 GMT+09:00 Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>:
>
>
>> Hello Heikki,
>>
>>
>> A couple of comments:
>>>
>>> * There should be an explicit "\setrandom ... uniform" option too, even
>>> though you get that implicitly if you don't specify the distribution
>>>
>>
>> Indeed. I agree. I suggested it, but it got lost.
>
> OK. If we keep to the SQL grammar, your saying is right. I will add it.
>
>
>> * What exactly does the "threshold" mean? The docs informally explain
>>> that "the larger the thresold, the more frequent values close to the middle
>>> of the interval are drawn", but that's pretty vague.
>>>
>>
>> There are explanations and computations as comments in the code. If it is
>> about the documentation, I'm not sure that a very precise mathematical
>> definition will help a lot of people, and might rather hinder
>> understanding, so the doc focuses on an intuitive explanation instead.
>
> Yeah, I think that we had better to only explain necessary infomation for
> using this feature. If we add mathematical theory in docs, it will be too
> difficult for user. And it's waste.
>
>
> * Does min and max really make sense for gaussian and exponential
>>> distributions? For gaussian, I would expect mean and standard deviation as
>>> the parameters, not min/max/threshold.
>>>
>>
>> Yes... and no:-) The aim is to draw an integer primary key from a table,
>> so it must be in a specified range. This is approximated by drawing a
>> double value with the expected distribution (gaussian or exponential) and
>> project it carefully onto integers. If it is out of range, there is a loop
>> and another value is drawn. The minimal threshold constraint (2.0) ensures
>> that the probability of looping is low.
>
> I think it is difficult to understand from our text... So I create picture
> that will help you to understand it.
> Please see it.
>
>
>>
>> * How about setting the variable as a float instead of integer? Would
>>> seem more natural to me. At least as an option.
>>>
>>
>> Which variable? The values set by setrandom are mostly used for primary
>> keys. We really want integers in a range.
>
> I think he said threshold parameter. Threshold parameter is very sensitive
> parameter, so we need to set double in threshold. I think that you can
> consent it when you see attached picture.
>
> regards,
> --
> Mitsumasa KONDO
> NTT Open Source Software Center
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2014-03-15 09:26:36 Re: Patch: show relation and tuple infos of a lock to acquire
Previous Message Mitsumasa KONDO 2014-03-15 08:50:43 Re: gaussian distribution pgbench