Re: Random-looking primary keys in the range 100000..999999

From: Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>
To: Kynn Jones <kynnjo(at)gmail(dot)com>, pgsql-general General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Random-looking primary keys in the range 100000..999999
Date: 2014-07-05 04:02:39
Message-ID: 53B778DF.6010604@archidevsys.co.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 05/07/14 15:48, Gavin Flower wrote:
> On 05/07/14 01:24, Kynn Jones wrote:
>> I'm looking for a way to implement pseudorandom primary keys in the
>> range 100000..999999.
>>
>> The randomization scheme does not need to be cryptographically
>> strong. As long as it is not easy to figure out in a few minutes
>> it's good enough.
>>
>> My starting point for this is the following earlier message to this list:
>>
>> http://www.postgresql.org/message-id/49F96730.4000706@postnewspapers.com.au
>>
>> The answer given to it here
>>
>> http://www.postgresql.org/message-id/448163db-cac5-4e99-8c4c-57cbc6f6af78@mm
>>
>> ...is really cool, but I don't see how to modify it for the case
>> where the domain of the permutation has a cardinality that is not a
>> power of 2, as it is in my case (cardinality = 900000).
>>
>> ---
>>
>> (In the crypto world there are "format preserving encryption"
>> techniques that probably could do what I want to do, but their focus
>> on cryptographic strength makes learning and implementing them tough
>> going, plus, the performance will probably be poor, since high
>> workloads are an asset for such crypto applications. Since
>> cryptographic strength is not something I need, I'm trying to find
>> non-crypt-grade alternatives.)
>>
>> Thanks in advance!
>>
>> kynn
>>
> Hi Kynn,
>
> How about (note that 'payload' could be any set of valid columns):
>
> -- using a crude Linear Congruential Generator
> -- not very random, but does NOT create duplicates
>
>
> DROP TABLE IF EXISTS rtab;
> DROP SEQUENCE IF EXISTS rseq;
>
> CREATE SEQUENCE rseq;
>
> CREATE TABLE rtab
> (
> id int PRIMARY KEY default(100000 + (nextval('rseq') * 543537
> + 997) % 900000),
> payload int NOT NULL
> );
>
> INSERT INTO rtab (payload) VALUES (generate_series(1, 100000));
>
> TABLE rtab;
>
> Sample output:
>
> id | payload
> --------+---------
> 644534 | 1
> 288071 | 2
> 831608 | 3
> 475145 | 4
> 118682 | 5
> 662219 | 6
> 305756 | 7
> 849293 | 8
> 492830 | 9
> 136367 | 10
> 679904 | 11
> 323441 | 12
> 866978 | 13
> 510515 | 14
> 154052 | 15
> 697589 | 16
> 341126 | 17
> 884663 | 18
> 528200 | 19
> 171737 | 20
>
>
>
> Cheers,
> Gavin
Hmm...

for a 10 times larger range
id int PRIMARY KEY default(1000000 + (nextval('rseq') *
543537 + 997) % 9000000),
also works!

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2014-07-05 04:18:51 Re: pg_dump slower than pg_restore
Previous Message Gavin Flower 2014-07-05 03:48:35 Re: Random-looking primary keys in the range 100000..999999