Re: Database/Table Design for Global Country Statistics

From: Richard Huxton <dev(at)archonet(dot)com>
To: Stefan Schwarzer <stefan(dot)schwarzer(at)grid(dot)unep(dot)ch>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Database/Table Design for Global Country Statistics
Date: 2007-09-12 08:53:47
Message-ID: 46E7A91B.5070706@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Stefan Schwarzer wrote:
> Hi there,
>
> I learned in another posting that my table design - in a polite way -
> "could be improved".
>
> So, before doing any additional design errors, I would like to get
> feedback, if possible.
>
> I am dealing with some 500 tables for worldwide national statistics
> (GDP, population, environment etc.), covering approx. 30 years each. For
> each of these variables, I usually have as well (pre-prepared)
> subregional and regional aggregations too. These could - and should - at
> the end be calculated on-the-fly, and not pre-calculated and imported
> from Excel as it is for the moment.

Might want a trigger to keep them up to date.

> My (national) table for a given variable is in the moment as follows (id
> being the identifier for a specific country):
>
> id | 1970 | 1971 | ... | 2004 | 2005
> -------------------------------------------------------------------
> 1 | NULL | 36 | ... | 42 | 45
> 2 ......

Ick - fiddly if you we ever reach 2006...

> The new design would be like this:
>
> id | year | value
> -------------------------------
> 1 | 1970 | NULL
> 1 | 1971 | 36
> 1 ....
> 1 | 2005 | 45
> 2 | 1970 | ....
> 2 .....
>
>
> Would that be considered as "good table design" then?

Well if "id" is a country I'd rename it "country_id".

If all years+countries should have the same set of measurements you
might want it in one table:
(country_id, year, population, area, roads_in_km, ...)

However, if some measurements aren't relevant you're probably better off
with separate table for each measurement: country_population,
country_area etc.

The issue with NULLs is what do they mean. They should mean "unknown",
nothing more and nothing less. However, frequently you want to
distinguish between "no figure available" and "not applicable" (e.g.
"monarch" isn't relevant except in monarchies).

In this case it's probably best practice to separate out the values:
(country_id integer, year integer, has_monarch boolean, monarch text)
This way you can distinguish between has_monarch=false and monarch=''

--
Richard Huxton
Archonet Ltd

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tino Wildenhain 2007-09-12 09:16:12 Re: Database/Table Design for Global Country Statistics
Previous Message Richard Huxton 2007-09-12 08:45:00 Re: Question to schema public