Re: Populating missing dates in postgresql data

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: "Lavrenz, Steven M" <slavrenz(at)purdue(dot)edu>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Populating missing dates in postgresql data
Date: 2015-03-26 00:38:45
Message-ID: CAKFQuwaaw27EoO7M0z5FemJxHD39RGxhbeixZRY5vPKvExFPiQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, Mar 25, 2015 at 5:25 PM, Lavrenz, Steven M <slavrenz(at)purdue(dot)edu>
wrote:

> Alright everyone, this is a doozy of a problem. I am new to Postgres so
> I appreciate patience/understanding. I have a database of hardware objects,
> each of which has several different “channels”. Once per day, these
> channels are supposed to check in with a central server, generating an
> event log table (TABLE A) like the following:
>
>
>
> object_id channel check-in date
>
> ****************************************
>
> 990 1 2014-12-01
>
> 990 1 2014-12-02
>
> 990 2 2014-12-01
>
> 990 2 2014-12-02
>
> 286 2 2014-12-01
>
> 286 2 2014-12-02
>
> 286 5 2014-12-01
>
> 286 5 2014-12-02
>
> 4507 1 2014-12-01
>
> 4507 1 2014-12-02
>
> 4507 2 2014-12-01
>
> 4507 2 2014-12-02
>
>
>
> And so on. Occasionally, communications will break down to the hardware,
> such that no reporting occurs. For example, let’s say that object 286 loses
> communications on 12/1/2014. Then the table might look like:
>
>
>
> object_id channel check-in date
>
> ****************************************
>
> 990 1 2014-12-01
>
> 990 1 2014-12-02
>
> 990 2 2014-12-01
>
> 990 2 2014-12-02
>
>
>
> 286 2 2014-12-02
>
>
>
> 286 5 2014-12-02
>
> 4507 1 2014-12-01
>
> 4507 1 2014-12-02
>
> 4507 2 2014-12-01
>
> 4507 2 2014-12-02
>
>
>
> Or let’s say that for some reason, just channel 2 loses reporting for a
> day. Then we would have:
>
>
>
> object_id channel check-in date
>
> ****************************************
>
> 990 1 2014-12-01
>
> 990 1 2014-12-02
>
> 990 2 2014-12-01
>
> 990 2 2014-12-02
>
>
>
> 286 2 2014-12-02
>
> 286 5 2014-12-01
>
> 286 5 2014-12-02
>
> 4507 1 2014-12-01
>
> 4507 1 2014-12-02
>
> 4507 2 2014-12-01
>
> 4507 2 2014-12-02
>
>
>
> I have a second table (TABLE B) with all of the object_ids and channels
> that are supposed to be reporting in each day. For cases where a certain
> channel does not check in, I want to add a column that indicates the comm
> failure. So, for the example where all channels on object 286 do not check
> in, I would like to get is something like this:
>
>
>
> object_id channel check-in
> date comm failure
>
> **********************************************************
>
> 990 1
> 2014-12-01 No
>
> 990 1
> 2014-12-02 No
>
> 990 2
> 2014-12-01 No
>
> 990 2
> 2014-12-02 No
>
> 286 2
> 2014-12-01 Yes
>
> 286 2
> 2014-12-02 No
>
> 286 5
> 2014-12-01 Yes
>
> 286 5
> 2014-12-02 No
>
> 4507 1
> 2014-12-01 No
>
> 4507 1
> 2014-12-02 No
>
> 4507 2
> 2014-12-01 No
>
> 4507 2
> 2014-12-02 No
>
>
>
>
>
> I have been racking my mind for the better part of a day on how to do
> this. The thing is that I can do a right join of TABLE B on TABLE A, and
> this will populate the missing object ids and channels. However, this only
> works for a single day, and it gives me something like:
>
>
>
> object_id channel check-in
> date comm failure
>
> **********************************************************
>
> 990 1
> 2014-12-01 No
>
> 990 1
> 2014-12-02 No
>
> 990 2
> 2014-12-01 No
>
> 990 2
> 2014-12-02 No
>
> 286 2
> Yes
>
> 286 2
> 2014-12-02 No
>
> 286 5
> Yes
>
> 286 5
> 2014-12-02 No
>
> 4507 1
> 2014-12-01 No
>
> 4507 1
> 2014-12-02 No
>
> 4507 2
> 2014-12-01 No
>
> 4507 2
> 2014-12-02 No
>
>
>
> I need to do a count of comm failures by day, so I need to populate the
> check-in date field. Please help!
>
>
>

​People are much more inclined to help if you make it easy for them.

Read up on "CTE" (WITH clause) in the documentation. And "VALUES". Both
in the "SELECT" command section.

Use those to create inline versions of your two tables and then provide the
query you can get working.

The basic solution is:

WITH actual_checkins (id, channel, date, count_for_day) AS (...)
, expected_checkins (id, channel, date) AS (...)
SELECT id, channel, date, COALESCE(count_for_day, 0) AS number_of_checkins
FROM expected_checkins
LEFT JOIN actual_checkins USING (id, channel, date)

You can convert zero/non-true to "no/yes" via a CASE WHEN count = 0 THEN
'no' ELSE 'yes' END or similar.

David J.

​P.S. the function "generate_series(date, date)" may serve you well

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Mitu Verma 2015-03-26 03:57:34 Re: Populating missing dates in postgresql data
Previous Message Adrian Klaver 2015-03-26 00:38:03 Re: Populating missing dates in postgresql data