Re: Best practices for aggregate table design

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: droberts <david(dot)roberts(at)riverbed(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Best practices for aggregate table design
Date: 2015-10-06 16:26:12
Message-ID: CAKFQuwYu9gRw82dECvOwCaMDNV5yO5o+8xNzmGG=hWfzRqWtFg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, Oct 6, 2015 at 11:59 AM, droberts <david(dot)roberts(at)riverbed(dot)com>
wrote:

> Hi,
> I'm trying to construct an agg table to capture phone call data and group
> by
> state, city and time but also want just general measures by month. I'm
> thinking to have this:
>
> month | city_id | state_id | total_calls_inbound | total_calls_outbound |
> total_calls
>
> 2015-01 12 2 54 2 56
> 2015-01 10 4 147 15 162
> 2015-01 null null 201 17 218
>
>
> -----------------------
>
> and a dimension table to easily convert city, state into their string
> versions and also provide other attributes (e.g. GPS coordinates).
>
> My questions are:
> 1. I'm including 'total_calls' in the schema even thought it could easily
> be
> calculated from inbound + outbound. I did this for simplicity in a REST
> call, is that a bad idea?
>

​Hard to say​ given the limited insight into the use case. More concerned
about writing since on the read side you can easily wrap the table in a
view that provides the derived value as a column. You are also trading
space for processing power. You only end up processing the small subset
actively being queried presently while you end up storing the derived data
for every single record even if it is likely never to be queried again - or
at least queried in a highly time-sensitive environment.

2. I'm adding a 'null' row to show all the calls for a given month
> regardless of city or state, again to simplify the client side. It adds a
> row and is somewhat sparse but preferrable by the developer. Acceptable
> practice?
>

​I would make up a city_id and state_id representing "ALL" and use that in
place of NULL.

Storing derived information is a matter of making calculated trade-offs in
risking data anomalies in exchange for performance benefits. Choosing to
go this route is likely worthwhile if you can execute it correctly.

Neither choice is flat-out wrong. Beyond that it takes more information
than provided to pass judgement.

David J.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message John McKown 2015-10-06 16:38:32 Re: Processing data from table using awk.
Previous Message droberts 2015-10-06 15:59:59 Best practices for aggregate table design