From: | Filip Rembiałkowski <plk(dot)zuber(at)gmail(dot)com> |
---|---|
To: | dmitry(at)koterov(dot)ru |
Cc: | "Postgres General" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: How to speedup intarray aggregate function? |
Date: | 2007-10-10 02:30:29 |
Message-ID: | 92869e660710091930p59a10882j7ffe2429ba957086@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
2007/10/10, Dmitry Koterov <dmitry(at)koterov(dot)ru>:
> Hello.
>
> I created an aggregate:
>
> CREATE AGGREGATE intarray_aggregate_push (_int4)
> (
> STYPE = _int4,
> SFUNC = intarray_push_array,
> INITCOND = '{}'
> );
>
> (or - I may use _int_union instead of intarray_push_array, its speed is
> practically the same in my case).
> This aggregate merges together a list of integer[] arrays resulting one big
> array with all elements.
>
> Then I want to use this aggregate:
>
> SELECT intarray_aggregate_push(arrayfield)
> FROM arraytable
>
> The table arraytable contains a lot of rows (about 5000), each row has
> array with length of 5-10 elements, so - the resulting array should contain
> about 50000 elements.
>
> The query is okay, but its speed is too bad: about 1 second.
>
> The main problem is the speed of intarray_aggregate_push function - it is
> quite slow, because intarray_push_array reallocates the memory each time I
> merge two arrays. I am pretty sure that the reallocaton and copying is the
> bottleneck, because if I use another dummy aggreate:
>
> CREATE AGGREGATE intarray_aggregate_dummy (_int4)
> (
> STYPE = _int4,
> SFUNC = dummy,
> INITCOND = '{}'
> );
>
> CREATE OR REPLACE FUNCTION "public"."dummy" (a integer [], b integer [])
> RETURNS integer [] AS
> $body$ BEGIN RETURN a; END; $body$
> LANGUAGE 'plpgsql' VOLATILE CALLED ON NULL INPUT SECURITY INVOKER;
>
> where dummy() is the function which returns its first argument without any
> modification, the speed grows dramatically - about 25 ms (instead of 1000
> ms!).
>
> The question is: how could I optimize this, and is it possible at all in
> Postgres? I just want to get one large array glued from a lot of smaller
> arrays...
1. no wonder copying is the bottleneck - this is what the aggregate
does, mostly.
2. you can use plain array_cat for this, in my test it is few percent faster
3. in this case I guess intarrray contrib is not an option, AFAIK it
was created only for speeding up searches, that is int4[] lookups
4. to have this kind of optimization you talk about, we would need an
aggregate operating (in this case appending) directly on
internalstate. i'm not sure if this is possible in postgres
5. my results:
your method (using intarray_push_array): 940 ms
using array_cat: 860 ms
same in PL/PgSQL: (LOOP, append) 800 ms
same thing in Perl, no database (push array of arrays into one and
print ): 18 ms
cheers, Filip
--
Filip Rembiałkowski
From | Date | Subject | |
---|---|---|---|
Next Message | Jasbinder Singh Bali | 2007-10-10 02:56:27 | SLEEP in posgresql |
Previous Message | Gregory Stark | 2007-10-10 00:32:12 | Re: Generating subtotal reports direct from SQL |