From: | Gregory Stark <stark(at)enterprisedb(dot)com> |
---|---|
To: | "Neil Conway" <neilc(at)samurai(dot)com> |
Cc: | "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: RFC: array_agg() per SQL:200n |
Date: | 2008-01-28 09:57:23 |
Message-ID: | 87abmqs264.fsf@oxford.xeocode.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
"Neil Conway" <neilc(at)samurai(dot)com> writes:
> AFAIK the conclusion reached by the previous thread was that to be type
> safe, you'd need one distinct pseudotype per aggregate function, along
> with some way to let the planner distinguish this class of pseudotypes
> from other types (in order to apply the heuristic that functions like
> these are likely to consume more memory). You could identify this class
> by an additional column in pg_type, but I think we'd need a lot of
> machinery to do this properly (e.g. to allow these types to be created
> via SQL). I wonder if this isn't over-engineering: the simple approach
> originally followed by Stephen Frost was to declare the transition value
> as, say, int8, and just disallow the transition and final functions from
> being called outside an aggregate context. AFAIK this would be safe,
> although of course it is ugly.
The alternative is to use the regular array type and have the implementation
of it have some magic behind the scenes.
I was already thinking we might need some magic like this for read-only cases
like:
select * where i in array[1,3,5,...]
or
for i in 1..n
var = arrayvar[i]
...
end
Both of these are O(n^2) (assuming the size of the array and the number of
loop iterations are both n). Each array IN scan or index lookup is O(n).
These cases might be easier to deal with, my idea was to memoize the array
contents in a hash data structure referenced by the parse tree fnextra
pointer. The array functions would check their function call site's fnextra
pointer to see if the array has previously been cached in the more efficient
form and is the same array and then use either hash probes for the IN case or
a C datum array for the latter case.
Could the same be done by the aggregate call site where the aggregate's type
is a plain anyarray like normal, but the array_accum call would look at the
call site and stash the actual contents there in a linked list or tuplesort?
The actual anyarray data type would just have a flag saying "the data's over
there".
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's On-Demand Production Tuning
From | Date | Subject | |
---|---|---|---|
Next Message | Florian Weimer | 2008-01-28 10:37:03 | Re: plperl: Documentation on BYTEA decoding is wrong |
Previous Message | Russell Smith | 2008-01-28 08:59:18 | Re: Proposed patch: synchronized_scanning GUC variable |