Stumped on windowing

From: artacus(at)comcast(dot)net
To: pgsql-general(at)postgresql(dot)org
Subject: Stumped on windowing
Date: 2011-06-17 17:53:07
Message-ID: 439290549.2025501.1308333187885.JavaMail.root@sz0018a.emeryville.ca.mail.comcast.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I'm working with a product that uses effective date based data structures. We then create views using analytic functions that have begin and end dates for when that record was valid. This works fine when there is just one record per item that is valid at any given time (for instance job assignment details).

But now I have a table where multiple rows can be valid for a given date and it is giving me grief. This particular table is about job funding and at any given time a job may be funded out of several accounts. Here is a simplified example of the data:

EMP_ID POSN EFF_DATE FUND ORG PCT DENSE_RANK
56332 001071 2010-07-01 22086 182030 4.00 1
56332 001071 2010-07-01 24095 184001 10.00 1
56332 001071 2010-07-01 22065 182024 20.00 1
56332 001071 2010-07-01 14001 000121 26.00 1
56332 001071 2010-07-01 22088 182031 40.00 1
56332 001071 2010-12-01 24095 184001 14.00 2
56332 001071 2010-12-01 14001 000121 21.00 2
56332 001071 2010-12-01 22065 182024 25.00 2
56332 001071 2010-12-01 22088 182031 40.00 2
56332 001071 2011-04-01 22086 182030 4.00 3
56332 001071 2011-04-01 24095 184001 10.00 3
56332 001071 2011-04-01 22088 182031 40.00 3
56332 001071 2011-04-01 22065 182024 46.00 3

So my initial attempt was to do something like :

LEAD(eff_date - 1) OVER (PARTITION BY emp_id, posn, fund, org ORDER BY eff_date)

But that wont work in the above example because the fund 22086 drops off completely on 12/01 and returns 4/01. So if I used that approach the result would be wrong between 12/01 and 4/01. What I really need is the value from the next window but nothing seems to work that way. What I need is something like:

MIN ( CASE WHEN eff_date > CURRENT_ROW.eff_date THEN eff_date END ) OVER (
PARTITION BY emp_id, posn ORDER BY eff_date) <- a way not to include the current value

MIN(eff_date) OVER (PARTITION BY emp_id, posn ORDER BY eff_date RANGE UNBOUNDED FOLLOWING) <- not include current row

MIN(eff_date) OVER(PARTITION BY emp_id, posn ORDER BY eff_date RANGE BETWEEN CURRENT ROW + 1 AND UNBOUNDED FOLLOWING)

BTW, here is the correctly working dense rank part:
DENSE_RANK() OVER (PARTITION BY emp_id, posn ORDER BY eff_date)

Is there any way to do this with analytic functions or am I going to have to resort to a subquery?

Scott Bailey

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2011-06-17 18:15:13 Re: [HACKERS] Issues with generate_series using integer boundaries
Previous Message Scott Ribe 2011-06-17 17:35:11 Re: 2 questions re RAID