Re: Random-looking primary keys in the range 100000..999999

From: Kynn Jones <kynnjo(at)gmail(dot)com>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>, Kynn Jones <kynnjo(at)gmail(dot)com>, pgsql-general General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Random-looking primary keys in the range 100000..999999
Date: 2014-07-08 19:04:45
Message-ID: CAFvQaj5_v8DPAKeJM_8fa6WapWP78ucjH5a-qcj7rv6BP1KZHQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thanks to Gavin and Martijn for their suggestions. They're both simple
good-ol' LCGs, and both avoid the need to check for collisions.

I ultimately went with a multiplicative LCG, like Martijn's, mostly because
I understand better how it avoids collisions, so it was easier for me to
tweak it in various ways.

In particular, I changed the prime number from 899981 to the very lucky
prime 900001. This happens to work *perfectly*, because the range of such
a generator is p-1, not p. (BTW, Martijn's choice of the "random" 2345 for
the multiplier was a somewhat lucky one, since such generators are not full
for arbitrary multipliers; for example, the one with modulus 899981 is not
full for a multiplier of 3456, say.)

I also followed Martijn's pointer regarding the 3-argument form of python's
pow function, and implemented a 3-argument pow for PL/PgSQL. I include all
the code below, including a snippet borrowed from Gavin's post, and
modified here and there. (I'm not very experienced with PL/PgSQL, so
please feel free to point out ways in which my PL/PgSQL code can be
improved.)

First the functions:

CREATE OR REPLACE FUNCTION pow_mod(bigx bigint, n bigint, m bigint)
returns bigint AS $$
DECLARE
x bigint;
xx bigint;
BEGIN
IF n = 0 THEN RETURN 1; END IF;

x := bigx % m;
xx := (x * x) % m;

IF n % 2 = 0 THEN
RETURN pow_mod(xx, n/2, m);
ELSE
RETURN (x * pow_mod(xx, (n-1)/2, m)) % m;
END IF;

END;
$$ LANGUAGE plpgsql strict immutable;

-- "mcg" = "multiplicative congruential generator"
CREATE OR REPLACE FUNCTION mcg_900001(i bigint) returns int AS $$
BEGIN
-- CHECK (0 < i AND i < 900001)

RETURN 99999 + pow_mod(<INSERT YOUR MULTIPLIER HERE>, i, 900001);
END;
$$ LANGUAGE plpgsql strict immutable;

And here's a small demo:

DROP TABLE IF EXISTS rtab;
DROP SEQUENCE IF EXISTS rseq;

CREATE SEQUENCE rseq;

CREATE TABLE rtab
(
id int PRIMARY KEY DEFAULT mcg_900001(nextval('rseq')),
payload int NOT NULL
);

\timing on \\ INSERT INTO rtab (payload) VALUES (generate_series(1,
900000)); \timing off
-- Timing is on.
-- INSERT 0 900000
-- Time: 201450.781 ms
-- Timing is off.

SELECT * FROM rtab WHERE 449990 < payload AND payload < 450011;
-- id | payload
-- --------+---------
-- 539815 | 449991
-- 901731 | 449992
-- 878336 | 449993
-- 564275 | 449994
-- 863664 | 449995
-- 720159 | 449996
-- 987833 | 449997
-- 999471 | 449998
-- 999977 | 449999
-- 999999 | 450000
-- 921739 | 450001
-- 722684 | 450002
-- 596638 | 450003
-- 121592 | 450004
-- 687895 | 450005
-- 477734 | 450006
-- 585988 | 450007
-- 942869 | 450008
-- 175776 | 450009
-- 377207 | 450010
-- (20 rows)

kj

On Sat, Jul 5, 2014 at 4:35 AM, Martijn van Oosterhout <kleptog(at)svana(dot)org>
wrote:

> On Fri, Jul 04, 2014 at 09:24:31AM -0400, Kynn Jones wrote:
> > I'm looking for a way to implement pseudorandom primary keys in the range
> > 100000..999999.
> >
> > The randomization scheme does not need to be cryptographically strong.
> As
> > long as it is not easy to figure out in a few minutes it's good enough.
>
> Well, a trick that produces a not too easy to guess sequence is:
>
> X(n) = p^n mod q
>
> where q is prime. Pick the largest prime that will fit, in this case
> 899981 (I beleive) and some random p, say 2345.
>
> Then 100000 + (2345^n) mod 899981
>
> should be a sequence fitting your purpose. Unfortunatly, the pow()
> function in Postgres can't be used here (too slow and it overflows),
> but python has a helpful function:
>
> In [113]: len( set( pow(2345, n, 899981) for n in range(899981) ) )
> Out[113]: 899980
>
> You could probably write an equivalent function in Postgres if
> necessary.
>
> Hope this helps,
> --
> Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> > He who writes carelessly confesses thereby at the very outset that he
> does
> > not attach much importance to his own thoughts.
> -- Arthur Schopenhauer
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.10 (GNU/Linux)
>
> iQIVAwUBU7e450vt++dL5i1EAQhwew/9Fps1rkjMl85kAhD4nj9i5Gy+Y6T71vbS
> gXkHyJOVHr9r9kT8/1shG8OtTDEIKI1FEjDD5wdkTvj+K//wswPcpCIcj5eJVu5K
> 56v8ITYnc3/YeYthoBI829adAreP7kjBAJlB8lENTAbxkdJmRBEGA3KjEnSLj7I/
> pdqlrrbkUq7r/OBFlJYFnv/YXLAFeOWQRAk+Be+UorAUmkrvoA0g7gW4VEFnQ1Qk
> k1kTYIEU3HUXVDHUeYTC2jjLj7cFVhYaQ52FA950MzkpkqFAej34gpitcOFC8yf+
> KSglMq4nAFNF6VCU50RwPLjMIXXbHTSYxjJ5n3qYo4CExlg0wBLcmuu25GHc69qP
> wEx71lPvXb4yfI3YNNHcH3Cwgl46u5M5Dt2aqWDcr+haAy8Hmhm5zqjTcfpUhyD+
> efi8B512YDr4HoDV6qEKx0MdjHUFptX34L8tjkmnNYQlXj89ATE82lUoTusiIxts
> axwJwbjl81cg3ZbtfoWPQ3LXXSRNI0NhIkHX0k2xp3uJ+TX76UmPZYSzQ3M2QrhX
> atFCkcE4RqY26zwtxCp27yjnKMsMkEeo4z7JIQKjkwLzHGPavNd2PFV1EfCXNhwz
> aDXXZHzwymJjhgP1BH0mXrL6VD0UiQb3RTRH82RpG0MaNBRImcncCAjKlN5UzDur
> dwQY8zHxuOQ=
> =IJFu
> -----END PGP SIGNATURE-----
>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Merlin Moncure 2014-07-08 21:40:20 Re: Re : Re : Query "top 10 and others"
Previous Message Jim Mlodgenski 2014-07-08 16:49:52 Re: debugging with gdb in postgres