Quick Links

Re: ORDER BY random() LIMIT 1 slowness

From:	Jean-Luc Lachance <jllachan(at)nsd(dot)ca>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Gavin M(dot) Roy" <gmr(at)justsportsusa(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: ORDER BY random() LIMIT 1 slowness
Date:	2002-12-17 16:04:37
Message-ID:	3DFF4B15.C8B3A8B9@nsd.ca
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Gavin,

Assuming that you have a serial column rand on poetry and you did not
delete any row,
here is my suggestion:

CREATE TABLE poetry ( rand SERIAL, ... );

SELECT * FROM poetry WHERE rand = (
SELECT int8( curval( 'poetry_rand_seq') * random()));

JLL

Tom Lane wrote:
>
> "Gavin M. Roy" <gmr(at)justsportsusa(dot)com> writes:
> > SELECT * FROM poetry ORDER BY random() LIMIT 1;
> > [ is slow for 35000 rows ]
>
> Yeah. Basically this query is implemented as
> (a) select all 35000 rows of "poetry";
> (b) compute a random() value for each row;
> (c) sort by the random() values;
> (d) take the first row, discard the rest.
>
> The good news: this gives you a pretty-durn-random selection. The bad
> news: you didn't really care about choosing a random ordering of the
> other 34999 rows, but it computed one anyway.
>
> This problem's been discussed before, but I've not seen any really
> ideal answer. SQL is not designed to offer unpredictable results ;-)
>
> If you have a reasonably-uniformly-distributed index key on the rows,
> you can try something like
>
> select * from poetry where key > (scale * random() + min)
> order by key limit 1;
>
> (where scale and min are chosen to make the result cover the range of
> index keys). Given an index on "key" this should pick a result quickly
> via an indexscan+limit plan. However, if the distribution of keys isn't
> real uniform then you won't get real random results --- keys just above
> larger gaps will be selected with unfairly high probability. You also
> have to know the min and max keys accurately.
>
> I recall some folks have suggested generating an actual random column
> (ie, something initialized with "default random()" and suitably indexed)
> but this seems like a lot of overhead ... you'd have to be mighty
> concerned with the exactness of your random selection to put up with
> that, I think.
>
> There are some other suggestions in the archives, IIRC.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org

In response to

Re: ORDER BY random() LIMIT 1 slowness at 2002-12-17 05:44:13 from Tom Lane

Responses

Re: ORDER BY random() LIMIT 1 slowness at 2002-12-18 11:24:28 from SZUCS Gábor

Browse pgsql-general by date

	From	Date	Subject
Next Message	Neil Conway	2002-12-17 16:31:52	Re: 7.3 Prepared statements
Previous Message	Shridhar Daithankar	2002-12-17 16:03:21	Re: Server testing.