Re: Function array_agg(array)

From: Ali Akbar <the(dot)apaan(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Function array_agg(array)
Date: 2014-10-24 08:24:03
Message-ID: CACQjQLoGAS6uz6RZxQ=4vNYojZenkRpn28aBM=tcAL-gpHwwpQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Updated patch attached.

2014-10-22 20:51 GMT+07:00 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:
>>>>
>>>>> I agree with your proposal. I have a few comments to design:
>>>>>
>>>>
>>>>> 1. patch doesn't hold documentation and regress tests, please append
>>>>> it.
>>>>>
>>>>
i've added some regression tests in arrays.sql and aggregate.sql.

I've only found the documentation of array_agg. Where is the doc for
array(subselect) defined?

> 2. this functionality (multidimensional aggregation) can be interesting
>>>>> more times, so maybe some interface like array builder should be preferred.
>>>>>
>>>> We already have accumArrayResult and makeArrayResult/makeMdArrayResult,
>>>> haven't we?
>>>>
>>>> Actually array_agg(anyarray) can be implemented by using
>>>> accumArrayResult and makeMdArrayResult, but in the process we will
>>>> deconstruct the array data and null bit-flags into ArrayBuildState->dvalues
>>>> and dnulls. And we will reconstruct it through makeMdArrayResult. I think
>>>> this way it's not efficient, so i decided to reimplement it and memcpy the
>>>> array data and null flags as-is.
>>>>
>>>
>>> aha, so isn't better to fix a performance for accumArrayResult ?
>>>
>>
>> Ok, i'll go this route. While reading the code of array(subselect) as you
>> pointed below, i think the easiest way to implement support for both
>> array_agg(anyarray) and array(subselect) is to change accumArrayResult so
>> it operates both with scalar datum(s) and array datums, with performance
>> optimization for the latter.
>>
>
implemented it by modifying ArrayBuildState to ArrayBuildStateArray and
ArrayBuildStateScalar (do you have any idea for better struct naming?)

> In other places, i think it's clearer if we just use accumArrayResult and
>>>> makeArrayResult/makeMdArrayResult as API for building (multidimensional)
>>>> arrays.
>>>>
>>>>
>>>>> 3. array_agg was consistent with array(subselect), so it should be
>>>>> fixed too
>>>>>
>>>>> postgres=# select array_agg(a) from test;
>>>>> array_agg
>>>>> -----------------------
>>>>> {{1,2,3,4},{1,2,3,4}}
>>>>> (1 row)
>>>>>
>>>>> postgres=# select array(select a from test);
>>>>> ERROR: could not find array type for data type integer[]
>>>>>
>>>>
>>>> I'm pretty new in postgresql development. Can you point out where is
>>>> array(subselect) implemented?
>>>>
>>>
>>> where you can start?
>>>
>>> postgres=# explain select array(select a from test);
>>> ERROR: 42704: could not find array type for data type integer[]
>>> LOCATION: exprType, nodeFuncs.c:116
>>>
>>> look to code ... your magic keyword is a ARRAY_SUBLINK .. search in
>>> postgresql sources this keyword
>>>
>>
> attention: probably we don't would to allow arrays everywhere.
>

I've changed the places where i think it's appropriate.

> 4. why you use a magic constant (64) there?
>>>>>
>>>>> + astate->abytes = 64 * (ndatabytes == 0 ? 1 : ndatabytes);
>>>>> + astate->aitems = 64 * nitems;
>>>>>
>>>>> + astate->nullbitmap = (bits8 *)
>>>>> + repalloc(astate->nullbitmap, (astate->aitems + 7) /
>>>>> 8);
>>>>>
>>>>
>>>> just follow the arbitrary size choosen in accumArrayResult
>>>> (arrayfuncs.c):
>>>> astate->alen = 64; /* arbitrary starting array size */
>>>>
>>>> it can be any number not too small and too big. Too small, and we will
>>>> realloc shortly. Too big, we will end up wasting memory.
>>>>
>>>
>>> you can try to alloc 1KB instead as start -- it is used more times in
>>> Postgres. Then a overhead is max 1KB per agg call - what is acceptable.
>>>
>>> You take this value from accumArrayResult, but it is targeted for
>>> shorted scalars - you should to expect so any array will be much larger.
>>>
>>
this patch allocates 1KB (1024 bytes) if the ndatabytes is < 512bytes. If
it is larger, it allocates 4 * size. For nullbitmap, it allocates 4 *
number of items in array.

Regards,
--
Ali Akbar

Attachment Content-Type Size
array_agg_anyarray-4.patch text/x-patch 15.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2014-10-24 08:29:28 Re: alter user/role CURRENT_USER
Previous Message Amit Kapila 2014-10-24 07:18:13 Re: Wait free LW_SHARED acquisition - v0.2