Re: Postgres array parser

From: "Marc Mamin" <M(dot)Mamin(at)intershop(dot)de>
To: "Aleksej Trofimov" <aleksej(dot)trofimov(at)ruptela(dot)lt>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Postgres array parser
Date: 2011-12-14 10:27:10
Message-ID: C4DAC901169B624F933534A26ED7DF310861B23C@JENMAIL01.ad.intershop.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

> Yes, it would be great, but I haven't found such a function, which
> splits 2 dimensional array into rows =) Maybe we'll modify existing
> function, but unfortunately we have tried hstore type and function in
> postgres and we see a significant performance improvements. So we only
> need to convert existing data into hstore and I think this is a good
> solution.

I haven't tested hstore yet, but I would be interested to find out if it still better perform with custom "numeric" aggregates on the hstore values.

I've made a short "proof of concept" test with a custom key/value type to achieve such an aggregation.
Something like:

SELECT x, distinct_sum( (currency,amount)::keyval ) overview FROM ... GROUP BY x

x currency amount
a EUR 15.0
a EUR 5.0
a CHF 7.5
b USD 12.0
=>

x overview
- --------
a {(EUR,20.0), (CHF,7.5)}
b {(USD,10.0)}

regards,

Marc Mamin


> On 12/14/2011 11:21 AM, Marc Mamin wrote:
> > Hello,
> >
> > For such cases (see below), it would be nice to have an unnest
> function that only affect the first array dimension.
> >
> > Something like
> >
> > unnest(ARRAY[[1,2],[2,3]], SLICE=1)
> > =>
> > unnest
> > ------
> > [1,2]
> > [2,3]
> >
> >
> > With this function, I imagine that following sql function
> > might beat the plpgsql FOREACH version.
> >
> >
> > CREATE OR REPLACE FUNCTION input_value_un (in_inputs numeric[],
> in_input_nr numeric)
> > RETURNS numeric AS
> > $BODY$
> >
> > SELECT u[1][2]
> > FROM unnest($1, SLICE =1) u
> > WHERE u[1][1]=in_input_nr
> > LIMIT 1;
> >
> > $BODY$
> > LANGUAGE sql IMMUTABLE;
> >
> >
> >
> > best regards,
> >
> > Marc Mamin
> >
> >
> >> -----Original Message-----
> >> From: pgsql-performance-owner(at)postgresql(dot)org [mailto:pgsql-
> performance-
> >> owner(at)postgresql(dot)org] On Behalf Of Pavel Stehule
> >> Sent: Dienstag, 13. Dezember 2011 15:43
> >> To: Aleksej Trofimov
> >> Cc: pgsql-performance(at)postgresql(dot)org
> >> Subject: Re: [PERFORM] Postgres array parser
> >>
> >> Hello
> >>
> >> 2011/12/13 Aleksej Trofimov<aleksej(dot)trofimov(at)ruptela(dot)lt>:
> >>> We have tried foreach syntax, but we have noticed performance
> >> degradation:
> >>> Function with for: 203ms
> >>> Function with foreach: ~250ms:
> >>>
> >>> there is functions code:
> >>> CREATE OR REPLACE FUNCTION input_value_fe(in_inputs numeric[],
> >> in_input_nr
> >>> numeric)
> >>> RETURNS numeric AS
> >>> $BODY$
> >>> declare i numeric[];
> >>> BEGIN
> >>> FOREACH i SLICE 1 IN ARRAY in_inputs
> >>> LOOP
> >>> if i[1] = in_input_nr then
> >>> return i[2];
> >>> end if;
> >>> END LOOP;
> >>>
> >>> return null;
> >>> END;
> >>> $BODY$
> >>> LANGUAGE plpgsql VOLATILE
> >>> COST 100;
> >>>
> >>> CREATE OR REPLACE FUNCTION input_value(in_inputs numeric[],
> >> in_input_nr
> >>> numeric)
> >>> RETURNS numeric AS
> >>> $BODY$
> >>> declare
> >>> size int;
> >>> BEGIN
> >>> size = array_upper(in_inputs, 1);
> >>> IF size IS NOT NULL THEN
> >>>
> >>> FOR i IN 1 .. size LOOP
> >>> if in_inputs[i][1] = in_input_nr then
> >>> return in_inputs[i][2];
> >>> end if;
> >>> END LOOP;
> >>> END IF;
> >>>
> >>> return null;
> >>> END;
> >>> $BODY$
> >>> LANGUAGE plpgsql VOLATILE
> >>> COST 100;
> >>>
> >>>
> >>> On 12/13/2011 04:02 PM, Pavel Stehule wrote:
> >>>> Hello
> >>>>
> >>>> do you know FOREACH IN ARRAY statement in 9.1
> >>>>
> >>>> this significantly accelerate iteration over array
> >>>>
> >>>>
> >>>> http://www.depesz.com/index.php/2011/03/07/waiting-for-9-1-
> foreach-
> >> in-array/
> >>>>
> >>>>
> >>>> 2011/12/13 Aleksej Trofimov<aleksej(dot)trofimov(at)ruptela(dot)lt>:
> >>>>> Hello, I wanted to ask according such a problem which we had
> faced
> >> with.
> >>>>> We are widely using postgres arrays like key->value array by
> doing
> >> like
> >>>>> this:
> >>>>>
> >>>>> {{1,5},{2,6},{3,7}}
> >>>>>
> >>>>> where 1,2,3 are keys, and 5,6,7 are values. In our pgSql
> functions
> >> we are
> >>>>> using self written array_input(array::numeric[], key::numeric)
> >> function
> >>>>> which makes a loop on whole array and searches for key like
> >>>>> FOR i IN 1 .. size LOOP
> >>>>> if array[i][1] = key then
> >>>>> return array[i][2];
> >>>>> end if;
> >>>>> END LOOP;
> >>>>>
> >>>>> But this was a good solution until our arrays and database had
> >> grown. So
> >>>>> now
> >>>>> FOR loop takes a lot of time to find value of an array.
> >>>>>
> >>>>> And my question is, how this problem of performance could be
> >> solved? We
> >>>>> had
> >>>>> tried pgperl for string parsing, but it takes much more time than
> >> our
> >>>>> current solution. Also we are thinking about self-written C++
> >> function,
> >>>>> may
> >>>>> be someone had implemented this algorithm before?
> >>>>>
> >>>> you can use indexes or you can use hstore
> >>>>
> >>>> Regards
> >>>>
> >>>> Pavel Stehule
> >>>>
> >>>>> --
> >>>>> Best regards
> >>>>>
> >>>>> Aleksej Trofimov
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Sent via pgsql-performance mailing list
> >>>>> (pgsql-performance(at)postgresql(dot)org)
> >>>>> To make changes to your subscription:
> >>>>> http://www.postgresql.org/mailpref/pgsql-performance
> >>>
> >> It is strange - on my comp FOREACH is about 2x faster
> >>
> >> postgres=# select input_value(array(select
> >> generate_series(1,1000000)::numeric), 100000);
> >> input_value
> >> -------------
> >>
> >> (1 row)
> >>
> >> Time: 495.426 ms
> >>
> >> postgres=# select input_value_fe(array(select
> >> generate_series(1,1000000)::numeric), 100000);
> >> input_value_fe
> >> ----------------
> >>
> >> (1 row)
> >>
> >> Time: 248.980 ms
> >>
> >> Regards
> >>
> >> Pavel
> >>
> >>
> >> --
> >> Sent via pgsql-performance mailing list (pgsql-
> >> performance(at)postgresql(dot)org)
> >> To make changes to your subscription:
> >> http://www.postgresql.org/mailpref/pgsql-performance
>
>
> --
> Best regards
>
> Aleksej Trofimov
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-
> performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Kevin Martyn 2011-12-14 13:14:12 Re: copy vs. C function
Previous Message Aleksej Trofimov 2011-12-14 09:59:34 Re: Postgres array parser