Re: random() generates collisions too early

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Honza Horak <hhorak(at)redhat(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: random() generates collisions too early
Date: 2013-10-24 14:04:00
Message-ID: 526928D0.6090707@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 22.10.2013 15:41, Honza Horak wrote:
> On 10/21/2013 05:14 PM, Tom Lane wrote:
>> Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> writes:
>>> On 18.10.2013 14:55, Honza Horak wrote:
>>>> The results covered only 181383 distinct values, and 68 values
>>>> repeated four or five times each. We should at least consider using a
>>>> higher-entropy seed.
>>
>>> Interesting. PostgreSQL's random() function just calls the underlying
>>> libc random() function. I assume you tested this on with Linux and
>>> glibc.
>>
>> I agree with the theory that this probably isn't the fault of the
>> random()
>> function as such, but with our code to reset the random seed when forking
>> a postmaster child process. Note that the test case is only examining the
>> first random value created in each process. So basically what this is
>> measuring is the number of different seed values we use.
>>
>> regards, tom lane
>>
>
> After playing a bit more with random() value and the seed defined in
> src/backend/postmaster/postmaster.c:4038 it seems really like the seed
> doesn't have very good characteristic. The PID number used there ensures
> the value to be different in child processes, but when only xor-ed with
> usecs, it doesn't enlarge the total data domain of the seed, which stays
> 1M values. Then having in mind the birthday paradox and probably not
> ideal PRNG, it doesn't seem unrealistic to get two same random numbers
> in only hundreds/thousands of tries.
>
> In order to enlarge possible data domain of the seed we can include sec
> count as well and shift some xor-ed variables to use the whole range or
> unsigned int. The attached patch uses a way that gave me much better
> results (collision found approx. after a hundred thousands of values):
>
> - srandom((unsigned int) (MyProcPid ^ usecs));
> + srandom((unsigned int) (MyProcPid ^ (usecs << 12) ^ secs));

Seems reasonable, committed. Thanks!

- Heikki

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Robert Haas 2013-10-24 15:05:23 Re: Re: [BUGS] BUG #7873: pg_restore --clean tries to drop tables that don't exist
Previous Message Heikki Linnakangas 2013-10-24 12:42:38 Re: random() generates collisions too early