Re: Populating missing dates in postgresql data

From: Alban Hertroys <haramrae(at)gmail(dot)com>
To: "Lavrenz, Steven M" <slavrenz(at)purdue(dot)edu>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Populating missing dates in postgresql data
Date: 2015-03-26 08:08:04
Message-ID: D5409921-71A5-40EC-B63D-EA50DD2DC329@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


> On 26 Mar 2015, at 1:25, Lavrenz, Steven M <slavrenz(at)purdue(dot)edu> wrote:
>
> Alright everyone, this is a doozy of a problem. I am new to Postgres so I appreciate patience/understanding. I have a database of hardware objects, each of which has several different “channels”. Once per day, these channels are supposed
>
>
>
> object_id channel check-in date
>
> ****************************************
>
> 990 1 2014-12-01
>
> 990 1 2014-12-02
>
> 990 2 2014-12-01
>
> 990 2 2014-12-02
>
> 286 2 2014-12-01
>
> 286 2 2014-12-02
>
> 286 5 2014-12-01
>
> 286 5 2014-12-02
>
> 4507 1 2014-12-01
>
> 4507 1 2014-12-02
>
> 4507 2 2014-12-01
>
> 4507 2 2014-12-02
>
>
>
> And so on. Occasionally, communications will break down to the hardware, such that no reporting occurs. object 286 loses communications on 12/1/2014. Then the table might look like:
>
>
>
> object_id channel check-in date
>
> ****************************************
>
> 990 1 2014-12-01
>
> 990 1 2014-12-02
>
> 990 2 2014-12-01
>
> 990 2 2014-12-02
>
>
>
> 286 2 2014-12-02
>
>
>
> 286 5 2014-12-02
>
> 4507 1 2014-12-01
>
> 4507 1 2014-12-02
>
> 4507 2 2014-12-01
>
> 4507 2 2014-12-02
>
>
>
> Or let’s say that for some reason, just channel 2 loses reporting for a day. Then we would have:
>
>
>
> object_id channel check-in date
>
> ****************************************
>
> 990 1 2014-12-01
>
> 990 1 2014-12-02
>
> 990 2 2014-12-01
>
> 990 2 2014-12-02
>
>
>
> 286 2 2014-12-02
>
> 286 5 2014-12-01
>
> 286 5 2014-12-02
>
> 4507 1 2014-12-01
>
> 4507 1 2014-12-02
>
> 4507 2 2014-12-01
>
> 4507 2 2014-12-02
>
>
>
> I have a second table (TABLE B) with all of the object_ids and channels that are supposed to be reporting in each day. For cases where a certain channel does not check in, I want to add a column that indicates the comm failure. So, for
>
>
>
> object_id channel check-in date comm failure
>
> **********************************************************
>
> 990 1 2014-12-01 No
>
> 990 1 2014-12-02 No
>
> 990 2 2014-12-01 No
>
> 990 2 2014-12-02 No
>
> 286 2 2014-12-01 Yes
>
> 286 2 2014-12-02 No
>
> 286 5 2014-12-01 Yes
>
> 286 5 2014-12-02 No
>
> 4507 1 2014-12-01 No
>
> 4507 1 2014-12-02 No
>
> 4507 2 2014-12-01 No
>
> 4507 2 2014-12-02 No
>
>
>
>
>
> I have been racking my mind for the better part of a day on how to do this. The thing is that I can do a right join of TABLE B on TABLE A, and this will populate the missing object ids and channels. However, this only works for a single
>
>
>
> object_id channel check-in date comm failure
>
> **********************************************************
>
> 990 1 2014-12-01 No
>
> 990 1 2014-12-02 No
>
> 990 2 2014-12-01 No
>
> 990 2 2014-12-02 No
>
> 286 2 Yes
>
> 286 2 2014-12-02 No
>
> 286 5 Yes
>
> 286 5 2014-12-02 No
>
> 4507 1 2014-12-01 No
>
> 4507 1 2014-12-02 No
>
> 4507 2 2014-12-01 No
>
> 4507 2 2014-12-02 No
>
>
>
> I need to do a count of comm failures by day, so I need to populate the check-in date field. Please help!

Easiest would be to insert the missing values in your table, something like:

WITH RECURSIVE calendar (missing_date) AS (
SELECT MAX(check_in_date) FROM table_a WHERE comm_failure = 'YES'
UNION ALL
SELECT missing_date + interval '1 day' FROM calendar
WHERE missing_date < CURRENT_DATE
)
INSERT INTO table_a (object_id, channel, check_in_date, comm_failure)
SELECT b.object_id, b.channel. c.missing_date, 'YES'
FROM table_b b, calendar c
WHERE NOT EXISTS (
SELECT 1
FROM table_a a
WHERE a.object_id = b.object_id
AND a.channel = b.channel
AND a.check_in_date = c.missing_date
);

That's off the top of my head, untested, etc, but I think I got that mostly right.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Craig Ringer 2015-03-26 09:14:07 Re: BDR - triggers on receiving node?
Previous Message David G. Johnston 2015-03-26 04:31:24 Re: Autovacuum query