Re: Given a set of daterange, finding the continuous range that includes a particular date

From: Ken Tanzer <ken(dot)tanzer(at)gmail(dot)com>
To: Viktor Fougstedt <viktor(at)chalmers(dot)se>
Cc: PG-General Mailing List <pgsql-general(at)postgresql(dot)org>
Subject: Re: Given a set of daterange, finding the continuous range that includes a particular date
Date: 2018-02-24 01:07:19
Message-ID: CAD3a31VXGX-o=9OHEHgD7Ou0iwHnZ6Bnd-dm8FDkokL9-tN8Tw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, Feb 23, 2018 at 6:42 AM, Viktor Fougstedt <viktor(at)chalmers(dot)se>
wrote:

>
> Hi.
>
> This answer is perhaps useful if I understand your problem correctly. But
> I might have interpreted it wrongly. :-)
>
>
> I would probably start with merging intervals so that overlapping and
> adjacent intervals become single continuous intervals, then select from
> those merged intervals.
>
> We have an application with a lot of interval handling in PostgreSQL, and
> we use many variants of algorithms based on one by Itzik Ben Gan which he
> calls “packing intervals”. The post we started with was an old reader’s
> challenge from SQL Magazine. It has since been updated when MS-SQL started
> supporting window functions better (Itzik Ben Gan is a MS-SQL-guy).
>
> Basically, it is a few CTE:s which convert the intervals into “start” (+1)
> and “stop” (-1) events, then keeps a running sum of these, and finally
> creates new intervals by noting that the merged (or “packed”) intervals
> starts with events that had sum=0 before them and stops with events which
> have sum=0 after them.
>
> It involves both CTE:s and window functions and is quite a beautiful
> example of SQL, IMHO.
>
> I think it’s best to google it, he does a good job of explaining how it
> works.
>
> Hope that helps a bit at least.
>
> /Viktor
>
> Hi. Thanks for the many helpful responses! Although I didn't end up with
exactly what I was looking for, I think I got to something that works good
enough for now. I did it with CTEs, in a way I think similar to what you
were suggesting (my printer is out of ink, so I didn't actually get to look
at that book yet!) I ended up having to do 4 passes:

1) Identify the starts and ends of continuous ranges
2) Eliminate the middle-point records (non stops/starts)
3) Merge the stop date in with the starts
4) Eliminate the stops

I couldn't see how to do it in less steps, but if there's a way...

I posted the query along with some sample data below.

Thanks again!

Ken

BEGIN;

CREATE TEMP TABLE sample_data (
client_id INTEGER,
start_date DATE,
end_date DATE,
episode INTEGER -- I won't actually have this info; just a label I added
for clarity and testing
);

INSERT INTO sample_data VALUES

(1,'1990-01-01','1990-12-31',0),
(1,'1991-01-01','1991-12-31',0),
(1,'1995-01-01','1995-06-30',1),
(1,'2000-01-01','2000-12-31',2),
(1,'2001-01-01','2001-12-31',2),
(1,'2002-01-01','2002-12-31',2),
(1,'2003-01-01','2003-12-31',2),
(1,'2004-01-01','2004-12-31',2),
(1,'2005-01-01','2005-12-31',2),
(1,'2006-01-01','2006-12-31',2),
(1,'2014-01-01','2014-12-31',3),
(1,'2015-01-01','2015-12-31',3),
(1,'2017-06-30','2017-12-31',4),
(1,'2018-01-01',NULL,4),

(2,'2014-02-01','2015-01-31',0),
(2,'2015-02-01','2015-12-31',0),
(2,'2017-09-30','2018-01-31',1),
(2,'2018-02-01','2018-02-14',1)
;

WITH peek3 AS (
WITH peek2 AS (
WITH peek AS (
SELECT
client_id,
episode,
daterange(start_date,end_date,'[]') AS range,
COALESCE(daterange(start_date,end_date+1,'[]') &&
lead(daterange(start_date,end_date,'[]')) OVER (PARTITION BY client_id
ORDER BY start_date),false) AS continues,
COALESCE(daterange(start_date,end_date,'[]') &&
lag(daterange(start_date,end_date+1,'[]')) OVER (PARTITION BY client_id
ORDER BY start_date),false) AS is_continued
FROM
sample_data
)
SELECT
*
FROM peek
WHERE NOT (is_continued AND continues)
)
SELECT client_id,episode,range,
daterange(lower(range),
CASE WHEN lead(is_continued) OVER (PARTITION BY client_id ORDER BY range)
THEN
lead(upper(range)) OVER (PARTITION BY client_id ORDER BY range)
ELSE upper(range)
END) AS full_range
,continues,is_continued
FROM peek2
)
SELECT * FROM peek3
WHERE NOT is_continued
;

--
AGENCY Software
A Free Software data system
By and for non-profits
*http://agency-software.org/ <http://agency-software.org/>*
*https://demo.agency-software.org/client
<https://demo.agency-software.org/client>*
ken(dot)tanzer(at)agency-software(dot)org
(253) 245-3801

Subscribe to the mailing list
<agency-general-request(at)lists(dot)sourceforge(dot)net?body=subscribe> to
learn more about AGENCY or
follow the discussion.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Ken Tanzer 2018-02-24 02:11:51 Re: Given a set of daterange, finding the continuous range that includes a particular date (aggregates)
Previous Message Thomas Munro 2018-02-23 23:19:29 Re: Query error: could not resize shared memory segment