From: | Jean-Luc Lachance <jllachan(at)nsd(dot)ca> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | "Gavin M(dot) Roy" <gmr(at)justsportsusa(dot)com>, pgsql-general(at)postgresql(dot)org |
Subject: | Re: ORDER BY random() LIMIT 1 slowness |
Date: | 2002-12-17 16:04:37 |
Message-ID: | 3DFF4B15.C8B3A8B9@nsd.ca |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Gavin,
Assuming that you have a serial column rand on poetry and you did not
delete any row,
here is my suggestion:
CREATE TABLE poetry ( rand SERIAL, ... );
SELECT * FROM poetry WHERE rand = (
SELECT int8( curval( 'poetry_rand_seq') * random()));
JLL
Tom Lane wrote:
>
> "Gavin M. Roy" <gmr(at)justsportsusa(dot)com> writes:
> > SELECT * FROM poetry ORDER BY random() LIMIT 1;
> > [ is slow for 35000 rows ]
>
> Yeah. Basically this query is implemented as
> (a) select all 35000 rows of "poetry";
> (b) compute a random() value for each row;
> (c) sort by the random() values;
> (d) take the first row, discard the rest.
>
> The good news: this gives you a pretty-durn-random selection. The bad
> news: you didn't really care about choosing a random ordering of the
> other 34999 rows, but it computed one anyway.
>
> This problem's been discussed before, but I've not seen any really
> ideal answer. SQL is not designed to offer unpredictable results ;-)
>
> If you have a reasonably-uniformly-distributed index key on the rows,
> you can try something like
>
> select * from poetry where key > (scale * random() + min)
> order by key limit 1;
>
> (where scale and min are chosen to make the result cover the range of
> index keys). Given an index on "key" this should pick a result quickly
> via an indexscan+limit plan. However, if the distribution of keys isn't
> real uniform then you won't get real random results --- keys just above
> larger gaps will be selected with unfairly high probability. You also
> have to know the min and max keys accurately.
>
> I recall some folks have suggested generating an actual random column
> (ie, something initialized with "default random()" and suitably indexed)
> but this seems like a lot of overhead ... you'd have to be mighty
> concerned with the exactness of your random selection to put up with
> that, I think.
>
> There are some other suggestions in the archives, IIRC.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org
From | Date | Subject | |
---|---|---|---|
Next Message | Neil Conway | 2002-12-17 16:31:52 | Re: 7.3 Prepared statements |
Previous Message | Shridhar Daithankar | 2002-12-17 16:03:21 | Re: Server testing. |