On Wed, Apr 27, 2011 at 18:06, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
> On Wed, Apr 27, 2011 at 12:00 PM, Adrian Schreyer <ams214(at)cam(dot)ac(dot)uk> wrote:
>> The largest arrays I expect at the moment are more or less sparse
>> vectors of around 4.8k elements and I have noticed that the
>> input/output (C/C++ extension) does not scale well with the number of
>> elements in the array.
>>
>> Using a function that sums all elements in the array, this is the time
>> it takes for ~150k arrays of various sizes (including ordering desc
>> and limit 10):
>>
>> 128: 61ms
>> 256: 80ms
>> 512: 681ms
>> 1024 1065ms
>> 2048 7682ms
>> 4096 21332ms
>
> hm, I'm not following you exactly -- what sql are you running? This
> scales pretty well for me:
> select array_dims(array(select generate_series(1,1000000)));
> etc
>
> merlin
>
I have a C extension function that creates _int4 arrays of a specified
size with random elements, in this case 128,256,512 etc. Another
function from my extension returns the sum of the array. In this case
I created a table with around 150k arrays to benchmark the extension.
The query sums each array in the table and returns the 10 highest
numbers. The C extension is actually a wrapper around the Eigen 3
template library, which works pretty well - now I am trying to tweak
the input/output functions to get better performance with larger
arrays.