Re: Converting row elements into a arrays?

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Ron <ronljohnsonjr(at)gmail(dot)com>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Converting row elements into a arrays?
Date: 2023-03-04 06:01:59
Message-ID: CAFj8pRB1UpqgSQe4Daa4CCZP6Kj+rVRNGrVeG5+5o4GtgKcBKQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

pá 3. 3. 2023 v 21:51 odesílatel Merlin Moncure <mmoncure(at)gmail(dot)com> napsal:

> On Thu, Mar 2, 2023 at 3:47 PM Ron <ronljohnsonjr(at)gmail(dot)com> wrote
>
>> I'm used to the bog standard COUNT, AVG, MIN, MAX, SUM. It didn't occur
>> to me that there would be others...
>>
>
> wait until you find out you can write your own:
>
> CREATE OR REPLACE FUNCTION agg_leftagg(l TEXT, r anyelement) returns text
> as
> $$
> BEGIN
> RETURN CASE
> WHEN l IS NOT NULL THEN format('%s-%s', l, r)
> ELSE r::TEXT
> END;
> END;
> $$ LANGUAGE PLPGSQL;
>
> CREATE AGGREGATE leftagg(anyelement) (SFUNC=agg_leftagg, STYPE=TEXT);
>
> CREATE TEMP TABLE s AS SELECT generate_series(1,10) s;
>
> SELECT leftagg(s) FROM s GROUP BY random() > .5;
>
> postgres=# SELECT leftagg(s) FROM s GROUP BY random() > .5;
> leftagg
> ────────────
> 2-3-5-6-10
> 1-4-7-8-9
> (2 rows)
>
>
This can work, but can be slower for large data

fast (although not too effect :-)) way

(2023-03-04 06:22:56) postgres=# CREATE TEMP TABLE s AS SELECT
generate_series(1,10) s;
SELECT 10
(2023-03-04 06:22:57) postgres=# SELECT array_agg(s) FROM s GROUP BY
random() > .5;
┌──────────────┐
│ array_agg │
╞══════════════╡
│ {3,6,8,9,10} │
│ {1,2,4,5,7} │
└──────────────┘
(2 rows)

(2023-03-04 06:23:21) postgres=# SELECT array_to_string(array_agg(s), '-')
FROM s GROUP BY random() > .5;
┌──────────────────┐
│ array_to_string │
╞══════════════════╡
│ 1-2 │
│ 3-4-5-6-7-8-9-10 │
└──────────────────┘
(2 rows)

performance comparison on 1mil rows (but with enabled asserts), I modified
the query for returning 100 groups because building extra long strings are
slow

(2023-03-04 06:33:04) postgres=# EXPLAIN ANALYZE SELECT
array_to_string(array_agg(s), '-') FROM s GROUP BY (random() * 100)::int;
┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN

╞═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│ GroupAggregate (cost=135257.34..165257.34 rows=1000000 width=36) (actual
time=715.400..1128.007 rows=101 loops=1) │
│ Group Key: (((random() * '100'::double precision))::integer)

│ -> Sort (cost=135257.34..137757.34 rows=1000000 width=8) (actual
time=712.689..853.335 rows=1000000 loops=1) │
│ Sort Key: (((random() * '100'::double precision))::integer)

│ Sort Method: external merge Disk: 17664kB

│ -> Seq Scan on s (cost=0.00..21925.00 rows=1000000 width=8)
(actual time=6.135..192.553 rows=1000000 loops=1) │
│ Planning Time: 0.082 ms

│ JIT:

│ Functions: 7

│ Options: Inlining false, Optimization false, Expressions true,
Deforming true │
│ Timing: Generation 0.484 ms, Inlining 0.000 ms, Optimization 0.276 ms,
Emission 5.877 ms, Total 6.637 ms │
│ Execution Time: 1133.816 ms

└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(12 rows)

array_to_string(array_agg()) .. 1sec
agg_leftagg .. 27 sec

using final function can be faster

create or replace function agg_leftagg_final(anycompatiblearray)
returns text as $$
begin
return array_to_string($1, '-');
end;
$$ language plpgsql;

CREATE AGGREGATE leftagg2(anycompatible) (SFUNC=array_append, STYPE =
anycompatiblearray, INITCOND = '{}', FINALFUNC = agg_leftagg_final);

(2023-03-04 06:57:18) postgres=# EXPLAIN ANALYZE SELECT leftagg2(s) FROM s
GROUP BY (random() * 100)::int;
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN

╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│ HashAggregate (cost=78175.00..353487.50 rows=1000000 width=36) (actual
time=436.202..540.029 rows=101 loops=1) │
│ Group Key: ((random() * '100'::double precision))::integer

│ Planned Partitions: 256 Batches: 1 Memory Usage: 11930kB

│ -> Seq Scan on s (cost=0.00..21925.00 rows=1000000 width=8) (actual
time=5.710..174.016 rows=1000000 loops=1) │
│ Planning Time: 0.231 ms

│ JIT:

│ Functions: 7

│ Options: Inlining false, Optimization false, Expressions true,
Deforming true │
│ Timing: Generation 1.491 ms, Inlining 0.000 ms, Optimization 0.240 ms,
Emission 5.471 ms, Total 7.202 ms │
│ Execution Time: 542.007 ms

└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(10 rows)

Time: 543,101 ms

It is a little bit surprising so significantly leftagg2 is faster than the
array_to_string(array_agg()) variant.

Regards

Pavel

this is why I fell in love with postgres 20 years ago, and never looked back
>
> merlion
>
>
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Peter J. Holzer 2023-03-04 08:03:03 Re: Dropping behavior for unique CONSTRAINTs
Previous Message David Rowley 2023-03-04 00:50:28 Re: Dropping behavior for unique CONSTRAINTs