Performance bottleneck due to array manipulation

From: Genc, Ömer <Oemer(dot)Genc(at)iais(dot)fraunhofer(dot)de>
To: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Performance bottleneck due to array manipulation
Date: 2015-08-21 12:48:37
Message-ID: a8f537986a1b4dfe8cc9d3d94822664f@e2k13-ms1.iais.fraunhofer.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hey,

i have a very long running stored procedure, due to array manipulation in a stored procedure. The following procedure takes 13 seconds to finish.

BEGIN
point_ids_older_than_one_hour := '{}';
object_ids_to_be_invalidated := '{}';

select ARRAY(SELECT
point_id
from ONLY
public.ims_point as p
where
p.timestamp < m_before_one_hour
)
into point_ids_older_than_one_hour ; -- this array has a size of 20k

select ARRAY(SELECT
object_id
from
public.ims_object_header h
WHERE
h.last_point_id= ANY(point_ids_older_than_one_hour)
)
into object_ids_to_be_invalidated; -- this array has a size of 100

-- current_last_point_ids will have a size of 100k
current_last_point_ids := ARRAY( SELECT
last_point_id
from
public.ims_object_header h
);
-- START OF PERFORMANCE BOTTLENECK
IF(array_length(current_last_point_ids, 1) > 0)
THEN
FOR i IN 0 .. array_upper(current_last_point_ids, 1)
LOOP
point_ids_older_than_one_hour = array_remove(point_ids_older_than_one_hour, current_last_point_ids[i]::bigint);
END LOOP;
END IF;
-- END OF PERFORMANCE BOTTLENECK
END;

The array manipulation part is the performance bottleneck. I am pretty sure, that there is a better way of doing this, however I couldn't find one.
What I have is two table, lets call them ims_point and ims_object_header. ims_object_header references some entries of ims_point in the column last_point_id.
Now I want to delete all entries from ims_point, where the timestamp is older than one hour. The currently being referenced ids of the table ims_object_header should be excluded from this deletion. Therefore I stored the ids in arrays and iterate over those arrays to exclude the referenced values from being deleted.

However, I not sure if using an array for an operation like this is the best approach.

Can anyone give me some advice how this could be enhanced.

Thanks in advance.

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2015-08-21 14:06:38 Re: Performance bottleneck due to array manipulation
Previous Message David G. Johnston 2015-08-21 12:24:37 Re: Most efficient way of querying M 'related' tables where N out of M may contain the key