From: | Ildar Musin <i(dot)musin(at)postgrespro(dot)ru> |
---|---|
To: | Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | MAP syntax for arrays |
Date: | 2018-05-04 13:08:03 |
Message-ID: | f86e5d8b-20a7-36ce-acd6-ac58b290c8f0@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello hackers,
Recently I was working with sql arrays in postgres and it turned out
that postgres doesn't have such very convinient functional constructions
as map, reduce and filter. Currently to map function over array user has
to make a subquery like:
select u.* from
my_table,
lateral (
select array_agg(lower(elem))
from unnest(arr) as elem
) as u;
Which is not only inconvenient but not very efficient as well (see
'Demo' section below).
When I dug into the code I found that postgres already has the needed
infrastructure for implementing map for arrays; actually array coercing
already works that way (it basically maps cast function).
In the attached patch there is a simple map implementation which
introduces new expression type and syntax:
MAP(<func_name> OVER <array_expression>)
For example:
SELECT MAP(upper OVER array['one', 'two', 'three']::text[]);
?column?
-----------------
{ONE,TWO,THREE}
(1 row)
This is probably not the most useful notation and it would be better to
have syntax for mapping arbitrary expressions over array, not just
function. I'm struggling to come up with a good idea of how it should
look like. It could look something like following:
MAP(<expr> FOR <placeholder> IN <array_expressin>)
For instance:
SELECT MAP(x*2 FOR x IN array[1, 2, 3]::int[]);
Looking forward for community's suggestions!
Demo
----
Here is a small comparison between map and unnest/aggregate ways for
per-element processing of arrays. Given a table with 1K rows which
contains single column of text[] type. Each array contains 5/10/100
elements.
create table my_table (arr text[]);
insert into my_table
select array_agg(md5(random()::text))
from generate_series(1, 1000) as rows,
generate_series(1, 10) as elements
group by rows;
There are two scripts for pgbench. One for 'map' syntax:
select map(upper over arr) from my_table;
And one for unnest/aggregate:
select u.* from my_table,
lateral (
select array_agg(upper(elem))
from unnest(arr) as elem
) as u;
Results are:
elements per array | map (tps) | unnest/aggregate (tps)
--------------------+------------+------------------------
5 | 139.105359 | 74.434010
10 | 74.089743 | 43.622554
100 | 7.693000 | 5.325805
Apparently map is more efficient for small arrays. And as the size of
array increases the difference decreases.
I'll be glad to any input from the community. Thanks!
--
Ildar Musin
i(dot)musin(at)postgrespro(dot)ru
Attachment | Content-Type | Size |
---|---|---|
map_v1.patch | text/x-diff | 18.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tatsuo Ishii | 2018-05-04 13:08:38 | Re: pg_rewind and postgresql.conf |
Previous Message | Michael Paquier | 2018-05-04 13:02:48 | Re: pg_rewind and postgresql.conf |