| From: | Alban Hertroys <dalroi(at)solfertje(dot)student(dot)utwente(dot)nl> | 
|---|---|
| To: | Yan Cheng CHEOK <yccheok(at)yahoo(dot)com> | 
| Cc: | pgsql-general(at)postgresql(dot)org | 
| Subject: | Re: Alternative to UPDATE (As COPY to INSERT) | 
| Date: | 2010-02-23 12:19:13 | 
| Message-ID: | ECEC31F4-B351-4C48-8455-34559EB3AD73@solfertje.student.utwente.nl | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-general | 
On 23 Feb 2010, at 10:26, Yan Cheng CHEOK wrote:
> I realize update operation speed in PostgreSQL doesn't meet my speed expectation.
> 
> Is there any fast alternative to UPDATE? as using fast COPY to INSERT operation.
Well, since an UPDATE is just a DELETE + INSERT and you're already doing this in one transaction, you could use DELETE + COPY instead. That's not as easy to do with your current approach though - converting those arrays to something COPY understands will probably take about as much time as your function is taking now.
It would probably be convenient to use a staging table (maybe temporary) to put the new values in before you act on them, so you don't need that hassle with arrays. I don't know where your data is coming from (I recall you work with a measurement machine in a capacitor plant?), but you could use COPY to fill the staging table (no constraints, yay!) and then:
  BEGIN;
  DELETE FROM statistics WHERE (fk_lot_id, measurement_type, statistic_type) IN (SELECT fk_lot_id, measurement_type, statistic_type FROM staging_table);
  INSERT INTO statistics (value, fk_lot_id, measurement_type, statistic_type) SELECT value, fk_lot_id, measurement_type, statistic_type FROM staging_table);
  TRUNCATE staging_table;
  COMMIT;
This isn't concurrency-safe, so you need to make sure no values are added to the staging table while you're doing the above.
I'm not sure you'll get to <1 ms doing this, that's a pretty steep requirement, but considering you can do it the "slow" way in 20ms it might just work.
Of course, if you have a staging table you could choose to operate on it less frequently; that would give you more time to operate on it and reduces the amount of overhead a little. I'm guessing your data comes in 24/7, so finding the right batch-size is part of your problem.
> Thanks!
> 
> I am using update in the following case :
> 
> CREATE OR REPLACE FUNCTION update_or_insert_statistic(integer, text[], text[], double precision[])
>  RETURNS void AS
> $BODY$DECLARE    
>  _lotID ALIAS FOR $1;
>  _measurementTypes ALIAS FOR $2;
>  _statisticTypes ALIAS FOR $3;
>  _values ALIAS FOR $4;
>  _row_count int; 
>  i int;  
> BEGIN
>    -- Parameters validation.
>    IF array_upper(_measurementTypes, 1) != array_upper(_statisticTypes, 1) OR array_upper(_measurementTypes, 1) != array_upper(_values, 1) THEN
>        RAISE EXCEPTION 'Inconsistency in array size';
>    END IF;
> 
>    FOR i IN SELECT generate_subscripts(_measurementTypes, 1)
>    LOOP
>        EXECUTE 'UPDATE statistic SET value = $1 WHERE fk_lot_id = $2 AND measurement_type = $3 AND statistic_type = $4'
>        USING _values[i], _lotID, _measurementTypes[i], _statisticTypes[i];
> 
>        GET DIAGNOSTICS _row_count = ROW_COUNT;
> 
>        IF _row_count = 0 THEN
>            EXECUTE 'INSERT INTO statistic(fk_lot_id, "value", measurement_type, statistic_type) VALUES ($1, $2, $3, $4)'
>            USING _lotID, _value, _measurementType, _statisticType;
>        END IF; 
>    END LOOP;
> END;$BODY$
> 
> I use the following "minimal" version
> 
> SELECT * FROM update_or_insert_statistic(1,array['Pad1 Area','Pad1 Area','Pad1 Area','Pad1 Area','Pad1 Area','Pad1 Area','Pad1 
> Area','Pad1 Area','Pad1 Area','Pad1 Area','Pad1 Area','Pad1 Area','Pad1 Area','Pad1 Pitch','Pad1 Pitch','Pad1 Pitch','Pad1 Pitch','Pad1 
> Pitch','Pad1 Pitch','Pad1 Pitch','Pad1 Pitch','Pad1 Pitch','Pad1 Pitch','Pad1 Pitch','Pad1 Pitch','Pad1 Pitch'], 
> array['LSL','USL','Nominal','Average','StdDev','StdDev3','CPK','Min','Max','Total','Valid','Invalid','Variance','LSL','USL','Nominal','Av
> erage','StdDev','StdDev3','CPK','Min','Max','Total','Valid','Invalid','Variance'], 
> array[0,0,0,4.94422589800714,3.16063453753607,0,0,1.01620532853175,9.98406933805353,20,20,0,9.98961067986587,0,0,0,6.56297341837825,2.512
> 73949943937,0,0,1.69188512833033,9.56794946134831,20,20,0,6.31385979204282])
> 
> It takes around 20ms :(
> 
> I am expecting < 1ms
> 
> Or shall I just go back to plain text in this case?
> 
> Thanks and Regards
> Yan Cheng CHEOK
> 
> 
> 
> 
> 
> -- 
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
> 
> 
> 
> 
Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.
!DSPAM:737,4b83c7c410447773417439!
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Alban Hertroys | 2010-02-23 12:21:41 | Re: pg_dump new version | 
| Previous Message | A. Kretschmer | 2010-02-23 12:10:42 | Re: Subqueries or Joins? Problems with multiple table query |