From: | mariusz <marius(at)mtvk(dot)pl> |
---|---|
To: | Ken Tanzer <ken(dot)tanzer(at)gmail(dot)com> |
Cc: | Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>, PG-General Mailing List <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Given a set of daterange, finding the continuous range that includes a particular date |
Date: | 2018-02-23 14:10:00 |
Message-ID: | 1519395000.32586.71.camel@mtvk.pl |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Thu, 2018-02-22 at 17:23 -0800, Ken Tanzer wrote:
>
>
> On Thu, Feb 22, 2018 at 5:05 PM, Adrian Klaver
> <adrian(dot)klaver(at)aklaver(dot)com> wrote:
> On 02/22/2018 04:58 PM, Ken Tanzer wrote:
>
>
>
> On Thu, Feb 22, 2018 at 4:53 PM, Adrian Klaver
> <adrian(dot)klaver(at)aklaver(dot)com
> <mailto:adrian(dot)klaver(at)aklaver(dot)com>> wrote:
>
> On 02/22/2018 04:44 PM, Ken Tanzer wrote:
>
> Hi, hoping to get some help with this. I'm
> needing to take a
> specific date, a series of dateranges and,
> given a specific
> date, return a single conitinuous daterange
> that includes that date.
>
> To elaborate a bit, I've got lots of tables
> that include start
> and end dates. For example:
>
> CREATE TABLE tbl_staff_assign (
> staff_assign_id SERIAL PRIMARY
> KEY,
> client_id INTEGER NOT NULL
> REFERENCES
> tbl_client (client_id),
> staff_id INTEGER
> REFERENCES
> tbl_staff(staff_id),
> staff_assign_type_code VARCHAR(10)
> NOT NULL
> REFERENCES tbl_l_staff_assign_type
> (staff_assign_type_code),
> staff_assign_date DATE NOT
> NULL,
> staff_assign_date_end DATE,
> ...
>
> So a client might leave a progrma and then
> return later, or they
> might simply switch to another staff_id. (In
> which case one
> record will have and end date, and the next
> record will start on
> the next day.) In this case I need to know
> "what period were
> they continuously in the program that includes
> X date?" So I'd
> like to be able to do something like:
>
> "SELECT
> staff_assign_date,continuous_daterange(
> staff_assign_date, (SELECT
>
> array_agg(daterange(staff_assign_date,staff_assign_date_end,'[]')
> ) FROM staff_assign sa2 WHERE
> sa2.client_id=sa.client_id) FROM
> staff_assign sa
>
> I've done this before with procedures specific
> to a particular
> table, and working with the start and end
> dates. I'm now
> wanting to try to do this once generically
> that will work for
> all my cases. So I'm hoping to do this in a
> way that
> performance isn't horrible. And it's a little
> unclear to me how
> much and how I might be able to use the
> daterange operators to
> accomplish this efficiently.
>
>
> The operator I use to solve similar problems:
>
>
> https://www.postgresql.org/docs/10/static/functions-range.html
>
> <https://www.postgresql.org/docs/10/static/functions-range.html>
>
> @> contains element
> '[2011-01-01,2011-03-01)'::tsrange
> @> '2011-01-10'::timestamp t
>
>
> Thanks Adrian. But how would you apply that to this
> situation, where I have a series of (quite possibly
> discontinuous) dateranges?
>
>
> This is going to depend on a more formal definition of the
> problem with some sample data. Right now I am trying to
> reconcile "what period were they continuously in the program
> that includes X date?" with "where I have a series of (quite
> possibly discontinuous) dateranges? " Maybe its just me, I
> can't see how discontinuous can also be continuously.
>
>
>
>
> Start End
> -- Episode 1
> 1/1/16 3/30/16
> 4/1/16 4/30/16
> -- Episode 2
>
> 1/1/18 1/31/18
> 2/1/18 NULL
>
>
> Given 1/15/18, should return 1/1/18-->NULL
> Given 1/15/16, should return 1/1/16-->4/30/16
>
i guess, you can easily get max continuous range for each row with
something like this:
CREATE OR REPLACE FUNCTION
append_daterange(d1 daterange, d2 daterange)
RETURNS daterange
LANGUAGE sql
AS
$$
SELECT CASE WHEN d1 && d2 OR d1 -|- d2 THEN d1 + d2 ELSE d2 END;
$$;
CREATE AGGREGATE agg_daterange (daterange) (
sfunc = append_daterange,
stype = daterange
);
SELECT dr,
lower(agg_daterange(dr) OVER (ORDER BY dr ASC)),
upper(agg_daterange(dr) OVER (ORDER BY dr DESC))
FROM ...
above example is simplified to selecting only daterange column "dr" for
readability, which in your case should be something like
daterange(staff_assign_date,staff_assign_date_end,'[)')
please note that daterange would be normalized to [) format so upper()
above returns exactly your max "staff_assign_date_end" for each
continuous range when dateranges are created with '[)' format.
the key point is ... ELSE d2 in append_daterange() which starts with new
value each time that new value is discontinuous with agg's state value
and order in which rows are processed (ASC for lower of daterange, DESC
for upper of daterange).
unfortunately this involves reading all rows for "client_id" and
additional sorting for each window.
i recall reading that you already pay the price of reading all rows for
client_id anyway, so the only question is the cost of two additional
sorts (maybe extracting dateranges to subset on which to do windows and
rejoining result of continuous ranges to original set would help to
lower the cost).
another way would be recursive cte, easy to append continuous dateranges
one by one from a set of dateranges staring with some date of choice,
but i doubt it would be cheaper, not to mention additional
postprocessing to get what you really want. and definitely would be less
readable.
i guess, from that point where for each row you have start and end date
of max continuous range including given row, you can easily format or
filter whatever you want
> Thinking about this a bit more, I'm wondering if a window function
> could be used for this? I've used them a bit, but definitely never
> wrote one. Something like
>
>
> continuous_daterange(staff_assign_date,daterange(staff_assign_date,staff_assign_date_end,'[[]') OVER (PARTION BY client_id)
>
>
> But then a window function can only see the rows included in the
> query, yes? In which case this would only work if you queried for all
> the rows for a particular client_id? I guess in the example I gave I
> was doing that anyway, so maybe this would be no worse.
>
>
> Ken
> --
>
> AGENCY Software
> A Free Software data system
> By and for non-profits
> http://agency-software.org/
> https://demo.agency-software.org/client
>
> ken(dot)tanzer(at)agency-software(dot)org
> (253) 245-3801
>
>
> Subscribe to the mailing list to
> learn more about AGENCY or
> follow the discussion.
From | Date | Subject | |
---|---|---|---|
Next Message | Viktor Fougstedt | 2018-02-23 14:42:19 | Re: Given a set of daterange, finding the continuous range that includes a particular date |
Previous Message | PT | 2018-02-23 11:02:13 | Re: Not sure if I should CREATE INDEX for text columns on which I plan to filter later |