How to speedup intarray aggregate function?

From: "Dmitry Koterov" <dmitry(at)koterov(dot)ru>
To: "Postgres General" <pgsql-general(at)postgresql(dot)org>
Subject: How to speedup intarray aggregate function?
Date: 2007-10-09 23:01:58
Message-ID: d7df81620710091601j7ed12403o3e11523dc3f0f82f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello.

I created an aggregate:

CREATE AGGREGATE intarray_aggregate_push (_int4)
(
STYPE = _int4,
SFUNC = intarray_push_array,
INITCOND = '{}'
);

(or - I may use _int_union instead of intarray_push_array, its speed is
practically the same in my case).
This aggregate merges together a list of integer[] arrays resulting one big
array with all elements.

Then I want to use this aggregate:

SELECT intarray_aggregate_push(arrayfield)
FROM arraytable

The table arraytable contains a lot of rows (about 5000), each row has array
with length of 5-10 elements, so - the resulting array should contain about
50000 elements.

The query is okay, but its speed is too bad: about 1 second.

The main problem is the speed of intarray_aggregate_push function - it is
quite slow, because intarray_push_array reallocates the memory each time I
merge two arrays. I am pretty sure that the reallocaton and copying is the
bottleneck, because if I use another dummy aggreate:

CREATE AGGREGATE intarray_aggregate_dummy (_int4)
(
STYPE = _int4,
SFUNC = dummy,
INITCOND = '{}'
);

CREATE OR REPLACE FUNCTION "public"."dummy" (a integer [], b integer [])
RETURNS integer [] AS
$body$ BEGIN RETURN a; END; $body$
LANGUAGE 'plpgsql' VOLATILE CALLED ON NULL INPUT SECURITY INVOKER;

where dummy() is the function which returns its first argument without any
modification, the speed grows dramatically - about 25 ms (instead of 1000
ms!).

The question is: how could I optimize this, and is it possible at all in
Postgres? I just want to get one large array glued from a lot of smaller
arrays...

P.S.

I have tested that

SELECT array_to_string(ARRAY(SELECT text FROM tbl), ' ')

query is many times faster than joining of all "text" fields inside one
pg/plsql stored function (I assume that it is because Postgres do not
reallocate & copy memory each time it glues a new text piece). But
unfortunately there is no way to convert integer[] to string to use this
method: I could write

select '{1,2}'::integer[]

but I couldn't use

select ARRAY[1,2]::text

Responses

Browse pgsql-general by date

  From Date Subject
Next Message pgsql.gen 2007-10-10 00:25:23 Using C API
Previous Message Erik Jones 2007-10-09 22:37:10 Re: Generating subtotal reports direct from SQL