From: | Kevin Murphy <murphy(at)genome(dot)chop(dot)edu> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | selecting random rows |
Date: | 2007-01-22 17:00:20 |
Message-ID: | 45B4EDA4.3080601@genome.chop.edu |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Here is Josh Berkus' solution for randomly picking a single row from a
query. I think the FAQ (www.postgresql.org/docs/faqs.FAQ.html#item4.1)
could be updated with a link to this solution, which is more practical
for large queries.
www.powerpostgresql.com/Random_Aggregate
Here is a discussion by Greg Sabino Mullane about getting random subsets
of table rows. Greg's approach involves modifying the table you want to
query from and is not very general-purpose (although it is a nice read).
I've seen Josh's approach extended to the multiple-row-subset case. I
think the perl pseudo-code would look like this:
# $n is the desired number of rows
while(<>)
{
if(rand($.)<$n) # This is the probability that the current line should
be in the output if it were the last line of the
input
{
# Remove (at random) one of the current
selections
splice(@lines,rand(@lines),1) if @lines==$n;
# and add the latest selection at the
end
push(@lines,$_);
}
}
Would it be possible to implement this as a function in PG?
Aside: I'm fantasizing about a postgresql archive of user-submitted
functions. Is the pgfoundry the closest thing to this?
-Kevin Murphy
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Fitzpatrick | 2007-01-22 17:05:52 | Re: MSSQL/ASP migration |
Previous Message | Ron Peterson | 2007-01-22 16:49:46 | Re: CAST function for user defined type |