Re: Alternative to UPDATE (As COPY to INSERT)

From: Alban Hertroys <dalroi(at)solfertje(dot)student(dot)utwente(dot)nl>
To: Yan Cheng CHEOK <yccheok(at)yahoo(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Alternative to UPDATE (As COPY to INSERT)
Date: 2010-02-23 12:19:13
Message-ID: ECEC31F4-B351-4C48-8455-34559EB3AD73@solfertje.student.utwente.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 23 Feb 2010, at 10:26, Yan Cheng CHEOK wrote:

> I realize update operation speed in PostgreSQL doesn't meet my speed expectation.
>
> Is there any fast alternative to UPDATE? as using fast COPY to INSERT operation.

Well, since an UPDATE is just a DELETE + INSERT and you're already doing this in one transaction, you could use DELETE + COPY instead. That's not as easy to do with your current approach though - converting those arrays to something COPY understands will probably take about as much time as your function is taking now.

It would probably be convenient to use a staging table (maybe temporary) to put the new values in before you act on them, so you don't need that hassle with arrays. I don't know where your data is coming from (I recall you work with a measurement machine in a capacitor plant?), but you could use COPY to fill the staging table (no constraints, yay!) and then:

BEGIN;
DELETE FROM statistics WHERE (fk_lot_id, measurement_type, statistic_type) IN (SELECT fk_lot_id, measurement_type, statistic_type FROM staging_table);
INSERT INTO statistics (value, fk_lot_id, measurement_type, statistic_type) SELECT value, fk_lot_id, measurement_type, statistic_type FROM staging_table);
TRUNCATE staging_table;
COMMIT;

This isn't concurrency-safe, so you need to make sure no values are added to the staging table while you're doing the above.

I'm not sure you'll get to <1 ms doing this, that's a pretty steep requirement, but considering you can do it the "slow" way in 20ms it might just work.

Of course, if you have a staging table you could choose to operate on it less frequently; that would give you more time to operate on it and reduces the amount of overhead a little. I'm guessing your data comes in 24/7, so finding the right batch-size is part of your problem.

> Thanks!
>
> I am using update in the following case :
>
> CREATE OR REPLACE FUNCTION update_or_insert_statistic(integer, text[], text[], double precision[])
> RETURNS void AS
> $BODY$DECLARE
> _lotID ALIAS FOR $1;
> _measurementTypes ALIAS FOR $2;
> _statisticTypes ALIAS FOR $3;
> _values ALIAS FOR $4;
> _row_count int;
> i int;
> BEGIN
> -- Parameters validation.
> IF array_upper(_measurementTypes, 1) != array_upper(_statisticTypes, 1) OR array_upper(_measurementTypes, 1) != array_upper(_values, 1) THEN
> RAISE EXCEPTION 'Inconsistency in array size';
> END IF;
>
> FOR i IN SELECT generate_subscripts(_measurementTypes, 1)
> LOOP
> EXECUTE 'UPDATE statistic SET value = $1 WHERE fk_lot_id = $2 AND measurement_type = $3 AND statistic_type = $4'
> USING _values[i], _lotID, _measurementTypes[i], _statisticTypes[i];
>
> GET DIAGNOSTICS _row_count = ROW_COUNT;
>
> IF _row_count = 0 THEN
> EXECUTE 'INSERT INTO statistic(fk_lot_id, "value", measurement_type, statistic_type) VALUES ($1, $2, $3, $4)'
> USING _lotID, _value, _measurementType, _statisticType;
> END IF;
> END LOOP;
> END;$BODY$
>
> I use the following "minimal" version
>
> SELECT * FROM update_or_insert_statistic(1,array['Pad1 Area','Pad1 Area','Pad1 Area','Pad1 Area','Pad1 Area','Pad1 Area','Pad1
> Area','Pad1 Area','Pad1 Area','Pad1 Area','Pad1 Area','Pad1 Area','Pad1 Area','Pad1 Pitch','Pad1 Pitch','Pad1 Pitch','Pad1 Pitch','Pad1
> Pitch','Pad1 Pitch','Pad1 Pitch','Pad1 Pitch','Pad1 Pitch','Pad1 Pitch','Pad1 Pitch','Pad1 Pitch','Pad1 Pitch'],
> array['LSL','USL','Nominal','Average','StdDev','StdDev3','CPK','Min','Max','Total','Valid','Invalid','Variance','LSL','USL','Nominal','Av
> erage','StdDev','StdDev3','CPK','Min','Max','Total','Valid','Invalid','Variance'],
> array[0,0,0,4.94422589800714,3.16063453753607,0,0,1.01620532853175,9.98406933805353,20,20,0,9.98961067986587,0,0,0,6.56297341837825,2.512
> 73949943937,0,0,1.69188512833033,9.56794946134831,20,20,0,6.31385979204282])
>
> It takes around 20ms :(
>
> I am expecting < 1ms
>
> Or shall I just go back to plain text in this case?
>
> Thanks and Regards
> Yan Cheng CHEOK
>
>
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>
>
>
>

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.

!DSPAM:737,4b83c7c410447773417439!

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Alban Hertroys 2010-02-23 12:21:41 Re: pg_dump new version
Previous Message A. Kretschmer 2010-02-23 12:10:42 Re: Subqueries or Joins? Problems with multiple table query