From: | Patrick Hatcher <PHatcher(at)macys(dot)com> |
---|---|
To: | pgsql-sql(at)postgresql(dot)org |
Subject: | Is there a faster way to do this? |
Date: | 2004-06-15 18:55:06 |
Message-ID: | OFCB14A796.8F1A370B-ON88256EB4.006549CA-88256EB4.0068B9F7@fds.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
pg: 7.4.2
RedHat 7.2
Can I get some advice on a possible faster way of doing this:
Scenario: Each day I update a column in a table with an internal
percentile value. To arrive at this value, I'll get a count of records
with sales > 0 and then divide this count by the total number of tile
groups I want. So for example:
Total records w/sales > 0 = 730,000
tile# = 100
total percentile groups (730,000 / 100):7300
Now that I have the total number of groups I need, I cycle through my
recordset, grab the key field and the current percentile number and stuff
the values into a temp table. I mod the current row counter against the
total percentile group number. If it is 0 then I add 1 to the current
percentile number . After inserting records into the temp file I then
update the main table.
Using the example above, the first 7300 records would get a percentile
number of 1, the next 7300 records would get a percentile number of 2,
then 3,4,5,etc.
Unfortunately, I am going record by record in a loop and the process takes
upwards of 20mins. Is there a faster way to do this? I thought about
using limit and offset, but I'm not sure how I would accomplish it.
Below is the function I currently use. Thanks for any help provided
CREATE OR REPLACE FUNCTION cdm.percentile_calc()
RETURNS text AS
'DECLARE
v_interval int4;
v_tile int4;
v_percentile int4;
v_check int4;
v_count int4;
v_rowcount int4;
myRec RECORD;
BEGIN
v_count:=0;
v_tile:= 100;
v_percentile:=1;
v_rowcount :=1;
v_check:=0;
/* Get count of records with val_purch > 0 */
select into v_count count(*) from cdm.cdm_indiv_mast where
val_purch_com >0;
/* this number will be used as part of our MOD to tell when to add one
to our percentile */
v_interval := v_count / v_tile;
CREATE TEMP TABLE cdmperct (f1 int8, f2 int2);
FOR myRec IN select indiv_key from cdm.cdm_indiv_mast where
val_purch_com >0 order by val_purch_com desc LOOP
INSERT INTO cdmperct values (myRec.indiv_key,v_percentile);
v_check = mod(v_rowcount,v_interval);
IF v_check = 0 THEN
v_percentile:=v_percentile+1;
END IF;
v_rowcount:= v_rowcount+1;
END LOOP;
UPDATE cdm.cdm_indiv_mast SET percentiler = f2 from cdmperct where
indiv_key = f1;
DROP TABLE cdmperct;
RETURN \'DONE\';
END; '
LANGUAGE 'plpgsql' IMMUTABLE;
Patrick Hatcher
Macys.Com
From | Date | Subject | |
---|---|---|---|
Next Message | Richard Huxton | 2004-06-15 20:17:05 | Re: Is there a faster way to do this? |
Previous Message | Kris Jurka | 2004-06-15 18:00:00 | Re: Prepare Statement |