Re: Offending My Tender Sensibilities -OR- OLTP on a Star Schema

From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Roy Anderson <roy(dot)anderson(at)gmail(dot)com>, pgsql <pgsql-general(at)postgresql(dot)org>
Subject: Re: Offending My Tender Sensibilities -OR- OLTP on a Star Schema
Date: 2014-03-04 20:15:41
Message-ID: 1393964141.1266.YahooMailNeo@web122301.mail.ne1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Roy Anderson <roy(dot)anderson(at)gmail(dot)com> wrote:

> We have an OLTP database and no data warehouse. We are currently
> planning out a build for a data warehouse however (possibly using
> Hadoop). "X" is recommending that we convert our current,
> normalized OLTP database into a flattened Star Schema.

I'm not going to repeat good advice you've already gotten in other
answers, but I will point out that complex reporting off of
normalized data is often much faster if you have been able to use
natural keys, even if you need to go to multi-column primary keys
to do so.  One of the biggest down-sides of synthetic primary keys
(where, for example, you might have a single-column PK column
called "id" in every table) is that forces one particular route to
"navigate" the tables.  With natural keys a complex query often
finds intriguing plans to give the results you ask for using plans
you might never have thought of, and which can be orders of
magnitude faster than the plans which would be possible if the
joins are all done using synthetic keys.

Beyond that, I would say that I would never jump to some "star"
schema automatically.  There are various ways data can be
summarized for faster reporting, and a flattened star schema is
only one option, which is not always the fastest option -- or even
an improvement over 3NF.  At the risk of repeating a little, I
recommend looking at what it would take to generate the reports you
want of the current data; and only denormalize where something is
too slow, using the summarization which appears to be the most
sensible for the use case.

As an aside, I had a case where auditors wanted a particular report
off of a 3 TB OLTP database.  One programmer tried to write it
using imperative code and looping.  Based on how far it got in the
first 5 hours, it would have taken a year to complete.  Rewritten
with a couple CTEs as a single declarative query (one SELECT
statement) it ran in ten minutes.  No star schema needed -- just
some clear thinking, and making use of the power of declarative
coding and a great optimizer.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Merlin Moncure 2014-03-04 20:40:26 Re: Offending My Tender Sensibilities -OR- OLTP on a Star Schema
Previous Message Jerry Sievers 2014-03-04 20:15:03 Re: log_statement per table