From: | Masaru Sugawara <rk73(at)sea(dot)plala(dot)or(dot)jp> |
---|---|
To: | gerard(at)interfold(dot)com |
Cc: | pgsql-novice(at)postgresql(dot)org |
Subject: | Fw: Selecting random rows using weights |
Date: | 2002-05-19 04:29:21 |
Message-ID: | 20020519132333.6BB4.RK73@sea.plala.or.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-novice |
On Sun, 12 May 2002 13:24:17 -0600
"G" <gerard(at)interfold(dot)com> wrote:
>
> Is there a way to select rows randomly using weight columns? (weighted
> random values)
>
I find out how to get what you want at random; however it's for only a row.
If you need ROWS, it's impossible to select them by using these functions
at least. Probably, another approach will be need.
-- First --
-- An id column must be unique.
create table tbl_random(id int4 unique not null, weight int4);
insert into tbl_random values(1, 1);
insert into tbl_random values(2, 10);
insert into tbl_random values(3, 0);
insert into tbl_random values(4, 3);
ALTER TABLE tbl_random ADD COLUMN r_start int4;
ALTER TABLE tbl_random ADD COLUMN r_end int4;
CREATE INDEX idx_random_r_start ON tbl_random(r_start);
CREATE INDEX idx_random_r_end ON tbl_random(r_end);
-- Second --
CREATE OR REPLACE FUNCTION fn_update_random() RETURNS boolean AS '
DECLARE
rec RECORD;
range int4 :=1;
BEGIN
FOR rec IN SELECT * FROM tbl_random WHERE weight > 0 LOOP
UPDATE tbl_random SET r_start = range,
r_end = weight + range - 1
WHERE id = rec.id;
SELECT INTO range (r_end + 1) FROM tbl_random
WHERE id = rec.id;
END LOOP;
FOR rec IN SELECT * FROM tbl_random WHERE weight = 0 LOOP
UPDATE tbl_random SET r_start = 0, r_end = 0
WHERE id = rec.id;
END LOOP;
RETURN true;
END;
' LANGUAGE 'plpgsql';
-- Third --
-- This query for updating the weighted range of r_start to r_end
-- needs to execute after rows are inserted into a target table,
-- ones are deleted from it, or the "weight" column in it is updated.
SELECT fn_update_random();
-- 4th --
-- A random number(0 to 1) is scaled by the maximum number of r_end, which
-- must be put into the subselect not to internally execute the random()
-- function twice. And if its scaled number is within the weighted range,
-- the unique row having the range will be selected.
SELECT t1.*, t2.r
FROM tbl_random AS t1,
(SELECT ceil(max(r_end) * (SELECT random())) AS r
FROM tbl_random) AS t2
WHERE t1.r_start <= t2.r AND t2.r <= t1.r_end;
-- Here is a bad example.
SELECT t1.*
FROM tbl_random AS t1
WHERE (SELECT ceil(max(r_end) * (SELECT random())) FROM tbl_random)
BETWEEN t1.r_start AND t1.r_end;
Regards,
Masaru Sugawara
From | Date | Subject | |
---|---|---|---|
Next Message | thiemo | 2002-05-19 08:15:06 | Forgotten the master password of db |
Previous Message | Tom Lane | 2002-05-18 15:36:21 | Re: ADD CONSTRAINT NOT NULL, how? |