Re: Generating random unique alphanumeric IDs

From: Sam Mason <sam(at)samason(dot)me(dot)uk>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Generating random unique alphanumeric IDs
Date: 2009-08-16 12:25:26
Message-ID: 20090816122526.GW5407@samason.me.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Sun, Aug 16, 2009 at 12:57:34PM +0100, Thom Brown wrote:
> > SELECT array_to_string(array((
> > SELECT SUBSTRING('abcdefghjklmnpqrstuvwxyz23456789'
> > FROM mod((random()*32)::int, 32)+1 FOR 1)
> > FROM generate_series(1,5))),'');

I've just had a look and PG does actually seem to be returning values as
I'd expect, i.e. 0 <= n < 1. So the following would work:

floor(random()*32)::int+1

instead of the mod hack. Distribution looks reasonably flat (this is
good):

char %occurances
1 3.1222
2 3.1329
3 3.1269
4 3.1236
5 3.1233
6 3.1310
7 3.1226
8 3.1298
9 3.1229
10 3.1294
11 3.1192
12 3.1249
13 3.1267
14 3.1236
15 3.1190
16 3.1279
17 3.1232
18 3.1218
19 3.1314
20 3.1091
21 3.1337
22 3.1239
23 3.1184
24 3.1347
25 3.1205
26 3.1160
27 3.1219
28 3.1344
29 3.1118
30 3.1256
31 3.1408
32 3.1255

> I'm not sure why you're saying that there's a 50%
> chance of duplication after 7240 values though. With 33 million
> combinations, I would have thought that duplications would become equally
> likely at the 16,777,216 mark.

No, that's why I pointed out birthday attacks---collisions happen much
more often than you'd expect. Get 23 people in a room and you have a
50% chance of two people having the same birthday--not 150 people. This
is why it's called the birthday attack and it's one of the basic tests
for hash functions--any bias in their output will shrink this number
even further.

--
Sam http://samason.me.uk/

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message NTPT 2009-08-16 13:06:15 Rapid Seek Devices (feature request)
Previous Message Ivan Sergio Borgonovo 2009-08-16 12:02:14 Re: Generating random unique alphanumeric IDs