Re: question about window function in C

From: Dan S <strd911(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: question about window function in C
Date: 2014-12-27 11:24:25
Message-ID: CAPpdapdAiKKnGJXE0X0XdGXEgVj5n8bE_9VSw7L+rFbTNWf3Aw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

2014-12-27 6:43 GMT+01:00 Merlin Moncure <mmoncure(at)gmail(dot)com>:

> On Fri, Dec 26, 2014 at 11:41 PM, Merlin Moncure <mmoncure(at)gmail(dot)com>
> wrote:
> > On Fri, Dec 26, 2014 at 1:19 PM, Dan S <strd911(at)gmail(dot)com> wrote:
> >> Well I'm trying to implement a window-function that works on
> range_types and
> >> produces 'atomic ranges' for each input range.
> >> Let's say I have a set of ranges some overlapping some not, and I want
> to
> >> split each range at every boundary of every overlapping range and return
> >> those.
> >> So for each range r I want to return an array of ranges that consists of
> >> range r split at every overlapping range boundary.
> >> I need to consider both upper and lower boundaries and to be able to do
> this
> >> in one pass over the data I need to sort both the upper and lower
> boundaries
> >> and also sort the original ranges so I can loop over the ranges and
> >> boundaries in lockstep to produce the arrays.
> >> As a last step I sort back the arrays of atomic ranges in the original
> order
> >> in the window so I can read out the tuplesort in that order and return
> each
> >> array to its corresponding range r.
> >>
> >> (The result can be used to answer questions like what are the maximum
> number
> >> of simultaneously overlapping ranges and at which ranges the maximum
> occurs)
>
> shoot -- sorry for empty mail -- misclick.
>
> anyways, can you give some precise examples of what you want to occur?
> for example, a set of sql taking actual inputs and the output you
> would like to see...
>
> merlin
>

Say I have theses ranges (in a table or as output of a sql statement):

[2000-01-01,2000-05-01)
[2000-02-01,2003-07-01)
[2002-06-01,2003-03-01)
[2003-04-01,2004-08-01)

then for the first range [2000-01-01,2000-05-01) I want the array output :
{[2000-01-01,2000-02-01),[2000-02-01,2000-05-01)}

for the next range [2000-02-01,2003-07-01) I want the array output:
{[2000-02-01,2000-05-01),[2000-05-01,2002-06-01),[2002-06-01,2003-03-01),[2003-03-01,2003-04-01),[2003-04-01,2003-07-01)}

and for the third range [2002-06-01,2003-03-01) I want the output:
{[2002-06-01,2003-03-01)}

and for the last range [2003-04-01,2004-08-01) the output should be:
{[2003-04-01,2003-07-01),[2003-07-01,2004-08-01)}

As you can see each input range is split up at each boundary where it is
overlapped by the other ranges in the set.
I have written a window function which does this.
On the first call into the window function it calculates all the split up
ranges and stores the resulting arrays into a tuplesort and sort these back
into the original order in the window function.
On subsequent calls into the window function it pulls one array of ranges
at a time from the tuplesort and returns that as output.
Then on the last call into the window function I need to release the
tuplesort so to not leak resources.

The problem is that I can't determine which is the last call into the
function so I can't release the tuplesort at the appropriate time.
I need a way to determine when to release the tuplesort, for example a way
to register a callback at appropriate time.

Sql wise the table can look like this and say I have the 4 ranges above in
the table.

create table t1 (id int, dr daterange);

and the query with my window function: select id, dr,
window_range_split(dr) over () from t1
this query will exhaust the whole window and I can determine when to
release the tuplesort, I release it when I pull out the last tuple in the
window.

However this query won't work: select id, dr, window_range_split(dr) over
() from t1 limit 2
the number of tuples in the window is 4 and I have no idea that there are
only going to be 2 calls into the window function and so I fail release the
tuplesort on the second call.

I should also mention that it's not specifically dateranges I'm targeting.
the window function takes anyrange as argument type and returns anyarray.

Best Regards
Dan S

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Viktor Shitkovskiy 2014-12-28 08:02:09 Rollback on include error in psql
Previous Message Merlin Moncure 2014-12-27 05:43:00 Re: question about window function in C