| From: | ptjm(at)interlog(dot)com (Patrick TJ McPhee) |
|---|---|
| To: | pgsql-general(at)postgresql(dot)org |
| Subject: | Re: Selecting K random rows - efficiently! |
| Date: | 2007-10-26 04:40:35 |
| Message-ID: | 13i2ru38icpr51a@corp.supernews.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-general |
In article <ffnid8$1q2t$1(at)news(dot)hub(dot)org>, cluster <skrald(at)amossen(dot)dk> wrote:
% > How important is true randomness?
%
% The goal is an even distribution but currently I have not seen any way
% to produce any kind of random sampling efficiently. Notice the word
How about generating the ctid randomly? You can get the number of pages
from pg_class and estimate the number of rows either using the number
of tuples in pg_class or just based on what you know about the data.
Then just generate two series of random numbers, one from 0 to the number
of pages and the other from 1 to the number of rows per page, and keep
picking rows until you have enough numbers. Assuming there aren't too
many dead tuples and your estimates are good, this should retrieve n rows
with roughly n look-ups.
If your estimates are low, there will be tuples which can never be selected,
and so far as I know, there's no way to construct a random ctid in a stock
postgres database, but apart from that it seems like a good plan. If
efficiency is important, you could create a C function which returns a
series of random tids and join on that.
--
Patrick TJ McPhee
North York Canada
ptjm(at)interlog(dot)com
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Joe Conway | 2007-10-26 05:45:20 | Re: [GENERAL] Crosstab Problems |
| Previous Message | Joe Conway | 2007-10-26 04:40:04 | Re: [GENERAL] Crosstab Problems |