From: | H(dot)Harada <umi(dot)tanuki(at)gmail(dot)com> |
---|---|
To: | "Simon Riggs" <simon(at)2ndquadrant(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org, "Martijn van Oosterhout" <kleptog(at)svana(dot)org> |
Subject: | Re: introduction of WIP window function patch |
Date: | 2008-07-06 08:39:11 |
Message-ID: | e08cc0400807060139m5fedd73buea68ed6ba508a95f@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
2008/7/6 Simon Riggs <simon(at)2ndquadrant(dot)com>:
>> I think there are two types of functions for windowed mode.
>> - windowed aggregate
>> this type of function is exactly same as normal aggregate. So we use
>> functions that have been in pgsql already. Actually in my patch above,
>> I didn't introduce any new function. This type of function includes
>> simply sum(), avg(), etc. which returns same values on a partition or
>> a window frame.
>>
>> - windowed function
>> this is the NEW type of function. I guess we should add a new function
>> type to pgsql. This type of function includes rank(), rank_dense(),
>> row_number(), etc. Windowed functions returns different values per
>> tuple.
>>
>> The difference between two types is if the function returns the same
>> value during a partition or different values.
>>
>> So, windowed aggregate and normal aggregate overlap each other. How
>> you know which one is that you see OVER clause in SQL just after the
>> function call. When you see OVER after func(), and pg_proc says it's
>> an aggregate, it's a windowed aggregate. Otherwise, it's a windowed
>> function.
>>
>> If I misunderstood about those definitions please correct me.
>
> Yes, I understand that and I think Martijn does also.
>
> I've done some thinking and rooting around on this and I think I have a
> different proposal for you, different to what we just discussed.
>
> SQL2008 specifies window functions as
>
> * rank functions
> * distribution functions: percent_rank() and cume_dist()
> * rownumber()
> * ntile()
> * lead() and lag()
> * first, last and n-th value functions
> * inverse distribution functions (similar to n-th value, based upon
> distribution function results)
>
> plus window aggregate functions (the normal aggregates COUNT, SUM etc)
>
> Now looking through all of those, I don't see *any* window functions
> that need access to different datatypes, or actually need to see the
> values of the attributes.
>
> The normal aggregates work with windows identically to the way they do
> without windows, so no change needed there.
>
> AFAICS we could define all of the non-aggregate window functions on the
> above list *without* defining them as functions in pg_proc. That would
> be a benefit because the window functions are very powerful and we'd
> need to give them access to any/all tuples in the window.
>
> So that would mean we don't provide a mechanism for user-defined
> windowed aggregate functions at all. Which solves the discussion about
> how to pass generic info through to them (at least long enough to get
> the first implementation done).
>
> We do already have such functions in code, e.g. greatest(). Sure they
> need to be defined in code, but we don't need to come up with a generic
> API for them.
>
> If you disagree, think about how we'd implement lag() or ntile() and
> what info we'd need to pass them.
Well, your idea is one of considerable choices. But I like pgsql's
extensibility that enables pgsql more powerful DBMS. So, I design it
as you propsed though trying to unify the function form somehow.
Just idea, how about pass window object to a function? We'll provide
window operation API then in the function you take window object
through fcinfo:
Datum func(PG_FUNCTION_ARGS)
{
Datum v;
WindowObject w = get_window(fcinfo);
HeapTuple htup_current = window_current_row(w);
HeapTuple htup_prev = window_preceding(w, 1);
/* do something */
PG_RETURN_DATUM(v);
}
so that a function access whole the window. APIs include
- current row
- preceding row
- following row
- current key
- preceding key
- following key
- iterate for the window
where "key" means ORDER BY values in OVER clause. Fortunately, my
patch uses tuplestore/tuplesort to create window, which allows random
access operation such above. Is there security/performance issue about
this?
--
Hitoshi Harada
From | Date | Subject | |
---|---|---|---|
Next Message | Simon Riggs | 2008-07-06 09:22:11 | Re: introduction of WIP window function patch |
Previous Message | David E. Wheeler | 2008-07-06 06:31:20 | PATCH: CITEXT 2.0 v2 |