From: | Erik Jones <erik(at)myemma(dot)com> |
---|---|
To: | kaloyan(at)digsys(dot)bg |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Obtaining random rows from a result set |
Date: | 2007-08-31 16:05:45 |
Message-ID: | 1F87CA98-2DFD-4551-812C-1B8F55506D09@myemma.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Aug 31, 2007, at 8:34 AM, Kaloyan Iliev wrote:
> Alban Hertroys wrote:
>
>> Hello,
>>
>> I've recently been busy improving a query that yields a fixed
>> number of
>> random records matching certain conditions. I have tried all the
>> usual
>> approaches, and although they do work, they're all limited in some
>> way
>> and don't translate really well to what you "want". They're
>> kludges, IMHO.
>>
>> The methods I've tried are explained quite well on
>> http://people.planetpostgresql.org/greg/index.php?/archives/40-
>> Getting-random-rows-from-a-database-table.html
>>
>> All these methods involve calculating a random number for every
>> record
>> in the result set at some point in time, which is really not what I'm
>> trying to model. I think the database should provide some means to
>> get
>> those records, so...
>>
>> Dear Santa,
>>
>> I'd like my database to have functionality analogue to how LIMIT
>> works,
>> but for other - non-sequential - algorithms.
>>
>> I was thinking along the lines of:
>>
>> SELECT *
>> FROM table
>> WHERE condition = true
>> RANDOM 5;
>>
>> Which would (up to) return 5 random rows from the result set, just as
>> LIMIT 5 returns (up to) the first 5 records in the result set.
>>
>>
>> Or maybe even with a custom function, so that you could get non-
>> linear
>> distributions:
>>
>> SELECT *
>> FROM table
>> WHERE condition = true
>> LIMIT 5 USING my_func();
>>
>> Where my_func() could be a user definable function accepting a
>> number
>> that should be (an estimate of?) the number of results being
>> returned so
>> that it can provide pointers to which rows in the resultset will be
>> returned from the query.
>>
>> Examples:
>> * random(maxrows) would return random rows from the resultset.
>> * median() would return the rows in the middle of the result set
>> (this
>> would require ordering to be meaningful).
>>
>> What do people think, is this feasable? Desirable? Necessary?
>>
>> If I'd have time I'd volunteer for at least looking into this, but
>> I'm
>> working on three projects simultaneously already. Alas...
>>
>> Regards,
>> Alban Hertroys.
>>
>>
> Hi,
> Why not generate a random number in your application and then:
>
> SELECT *
> FROM table_x
> WHERE condition = true
> OFFSET generated_random_number
> LIMIT xx
>
> Kaloyan Iliev
>
That won't work without some kind of a priori knowledge of how many
rows the query would return without the offset and limit.
Erik Jones
Software Developer | Emma®
erik(at)myemma(dot)com
800.595.4401 or 615.292.5888
615.292.0777 (fax)
Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com
From | Date | Subject | |
---|---|---|---|
Next Message | Marco Bizzarri | 2007-08-31 16:54:32 | computing and updating the size of a table with large objects |
Previous Message | Erik Jones | 2007-08-31 15:53:03 | Re: URGENT: Whole DB down ("no space left on device") |