From: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> |
---|---|
To: | Honza Horak <hhorak(at)redhat(dot)com> |
Cc: | pgsql-bugs(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | Re: random() generates collisions too early |
Date: | 2013-10-24 14:04:00 |
Message-ID: | 526928D0.6090707@vmware.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On 22.10.2013 15:41, Honza Horak wrote:
> On 10/21/2013 05:14 PM, Tom Lane wrote:
>> Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> writes:
>>> On 18.10.2013 14:55, Honza Horak wrote:
>>>> The results covered only 181383 distinct values, and 68 values
>>>> repeated four or five times each. We should at least consider using a
>>>> higher-entropy seed.
>>
>>> Interesting. PostgreSQL's random() function just calls the underlying
>>> libc random() function. I assume you tested this on with Linux and
>>> glibc.
>>
>> I agree with the theory that this probably isn't the fault of the
>> random()
>> function as such, but with our code to reset the random seed when forking
>> a postmaster child process. Note that the test case is only examining the
>> first random value created in each process. So basically what this is
>> measuring is the number of different seed values we use.
>>
>> regards, tom lane
>>
>
> After playing a bit more with random() value and the seed defined in
> src/backend/postmaster/postmaster.c:4038 it seems really like the seed
> doesn't have very good characteristic. The PID number used there ensures
> the value to be different in child processes, but when only xor-ed with
> usecs, it doesn't enlarge the total data domain of the seed, which stays
> 1M values. Then having in mind the birthday paradox and probably not
> ideal PRNG, it doesn't seem unrealistic to get two same random numbers
> in only hundreds/thousands of tries.
>
> In order to enlarge possible data domain of the seed we can include sec
> count as well and shift some xor-ed variables to use the whole range or
> unsigned int. The attached patch uses a way that gave me much better
> results (collision found approx. after a hundred thousands of values):
>
> - srandom((unsigned int) (MyProcPid ^ usecs));
> + srandom((unsigned int) (MyProcPid ^ (usecs << 12) ^ secs));
Seems reasonable, committed. Thanks!
- Heikki
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2013-10-24 15:05:23 | Re: Re: [BUGS] BUG #7873: pg_restore --clean tries to drop tables that don't exist |
Previous Message | Heikki Linnakangas | 2013-10-24 12:42:38 | Re: random() generates collisions too early |