Re: Random-looking primary keys in the range 100000..999999

From: Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>
To: Kynn Jones <kynnjo(at)gmail(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, pgsql-general General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Random-looking primary keys in the range 100000..999999
Date: 2014-07-09 02:04:51
Message-ID: 53BCA343.7020703@archidevsys.co.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Please don't top post!

See below for my comments.

On 09/07/14 07:04, Kynn Jones wrote:
> Thanks to Gavin and Martijn for their suggestions. They're both simple
> good-ol' LCGs, and both avoid the need to check for collisions.
>
> I ultimately went with a multiplicative LCG, like Martijn's, mostly
> because I understand better how it avoids collisions, so it was easier
> for me to tweak it in various ways.
>
> In particular, I changed the prime number from 899981 to the very
> lucky prime 900001. This happens to work *perfectly*, because the
> range of such a generator is p-1, not p. (BTW, Martijn's choice of
> the "random" 2345 for the multiplier was a somewhat lucky one, since
> such generators are not full for arbitrary multipliers; for example,
> the one with modulus 899981 is not full for a multiplier of 3456, say.)
>
> I also followed Martijn's pointer regarding the 3-argument form of
> python's pow function, and implemented a 3-argument pow for PL/PgSQL.
> I include all the code below, including a snippet borrowed from
> Gavin's post, and modified here and there. (I'm not very experienced
> with PL/PgSQL, so please feel free to point out ways in which my
> PL/PgSQL code can be improved.)
>
> First the functions:
>
> CREATE OR REPLACE FUNCTION pow_mod(bigx bigint, n bigint, m
> bigint) returns bigint AS $$
> DECLARE
> x bigint;
> xx bigint;
> BEGIN
> IF n = 0 THEN RETURN 1; END IF;
>
> x := bigx % m;
> xx := (x * x) % m;
>
> IF n % 2 = 0 THEN
> RETURN pow_mod(xx, n/2, m);
> ELSE
> RETURN (x * pow_mod(xx, (n-1)/2, m)) % m;
> END IF;
>
> END;
> $$ LANGUAGE plpgsql strict immutable;
>
>
> -- "mcg" = "multiplicative congruential generator"
> CREATE OR REPLACE FUNCTION mcg_900001(i bigint) returns int AS $$
> BEGIN
> -- CHECK (0 < i AND i < 900001)
> RETURN 99999 + pow_mod(<INSERT YOUR MULTIPLIER HERE>, i, 900001);
> END;
> $$ LANGUAGE plpgsql strict immutable;
>
>
> And here's a small demo:
>
> DROP TABLE IF EXISTS rtab;
> DROP SEQUENCE IF EXISTS rseq;
>
> CREATE SEQUENCE rseq;
>
> CREATE TABLE rtab
> (
> id int PRIMARY KEY DEFAULT mcg_900001(nextval('rseq')),
> payload int NOT NULL
> );
>
> \timing on \\ INSERT INTO rtab (payload) VALUES
> (generate_series(1, 900000)); \timing off
> -- Timing is on.
> -- INSERT 0 900000
> -- Time: 201450.781 ms
> -- Timing is off.
>
> SELECT * FROM rtab WHERE 449990 < payload AND payload < 450011;
> -- id | payload
> -- --------+---------
> -- 539815 | 449991
> -- 901731 | 449992
> -- 878336 | 449993
> -- 564275 | 449994
> -- 863664 | 449995
> -- 720159 | 449996
> -- 987833 | 449997
> -- 999471 | 449998
> -- 999977 | 449999
> -- 999999 | 450000
> -- 921739 | 450001
> -- 722684 | 450002
> -- 596638 | 450003
> -- 121592 | 450004
> -- 687895 | 450005
> -- 477734 | 450006
> -- 585988 | 450007
> -- 942869 | 450008
> -- 175776 | 450009
> -- 377207 | 450010
> -- (20 rows)
>
> kj
>
>
>
> On Sat, Jul 5, 2014 at 4:35 AM, Martijn van Oosterhout
> <kleptog(at)svana(dot)org <mailto:kleptog(at)svana(dot)org>> wrote:
>
> On Fri, Jul 04, 2014 at 09:24:31AM -0400, Kynn Jones wrote:
> > I'm looking for a way to implement pseudorandom primary keys in
> the range
> > 100000..999999.
> >
> > The randomization scheme does not need to be cryptographically
> strong. As
> > long as it is not easy to figure out in a few minutes it's good
> enough.
>
> Well, a trick that produces a not too easy to guess sequence is:
>
> X(n) = p^n mod q
>
> where q is prime. Pick the largest prime that will fit, in this case
> 899981 (I beleive) and some random p, say 2345.
>
> Then 100000 + (2345^n) mod 899981
>
> should be a sequence fitting your purpose. Unfortunatly, the pow()
> function in Postgres can't be used here (too slow and it overflows),
> but python has a helpful function:
>
> In [113]: len( set( pow(2345, n, 899981) for n in range(899981) ) )
> Out[113]: 899980
>
> You could probably write an equivalent function in Postgres if
> necessary.
>
> Hope this helps,
> --
> Martijn van Oosterhout <kleptog(at)svana(dot)org
> <mailto:kleptog(at)svana(dot)org>> http://svana.org/kleptog/
> > He who writes carelessly confesses thereby at the very outset
> that he does
> > not attach much importance to his own thoughts.
> -- Arthur Schopenhauer
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.10 (GNU/Linux)
>
> iQIVAwUBU7e450vt++dL5i1EAQhwew/9Fps1rkjMl85kAhD4nj9i5Gy+Y6T71vbS
> gXkHyJOVHr9r9kT8/1shG8OtTDEIKI1FEjDD5wdkTvj+K//wswPcpCIcj5eJVu5K
> 56v8ITYnc3/YeYthoBI829adAreP7kjBAJlB8lENTAbxkdJmRBEGA3KjEnSLj7I/
> pdqlrrbkUq7r/OBFlJYFnv/YXLAFeOWQRAk+Be+UorAUmkrvoA0g7gW4VEFnQ1Qk
> k1kTYIEU3HUXVDHUeYTC2jjLj7cFVhYaQ52FA950MzkpkqFAej34gpitcOFC8yf+
> KSglMq4nAFNF6VCU50RwPLjMIXXbHTSYxjJ5n3qYo4CExlg0wBLcmuu25GHc69qP
> wEx71lPvXb4yfI3YNNHcH3Cwgl46u5M5Dt2aqWDcr+haAy8Hmhm5zqjTcfpUhyD+
> efi8B512YDr4HoDV6qEKx0MdjHUFptX34L8tjkmnNYQlXj89ATE82lUoTusiIxts
> axwJwbjl81cg3ZbtfoWPQ3LXXSRNI0NhIkHX0k2xp3uJ+TX76UmPZYSzQ3M2QrhX
> atFCkcE4RqY26zwtxCp27yjnKMsMkEeo4z7JIQKjkwLzHGPavNd2PFV1EfCXNhwz
> aDXXZHzwymJjhgP1BH0mXrL6VD0UiQb3RTRH82RpG0MaNBRImcncCAjKlN5UzDur
> dwQY8zHxuOQ=
> =IJFu
> -----END PGP SIGNATURE-----
>
>
Didn't I mention that my code & variable names are copyrighted and my
methods patented? :-)

More seriously, its great that you picked out what you want from
multiple replies! Understanding what is given to you is far better than
blindly cutting & pasting.

Actually I just realized that having an additive constant is essentially
meaningless in this Use Case, as we are feeding in a sequence of
consecutive integers.

Cheers,
Gavin

In response to

Browse pgsql-general by date

  From Date Subject
Next Message M Tarkeshwar Rao 2014-07-09 09:18:40 Insert query hangs
Previous Message Merlin Moncure 2014-07-08 21:40:20 Re: Re : Re : Query "top 10 and others"