Data warehousing requirements

From: Gabriele Bartolini <angusgb(at)tin(dot)it>
To: pgsql-performance(at)postgresql(dot)org
Subject: Data warehousing requirements
Date: 2004-10-06 21:36:05
Message-ID: 6.1.2.0.2.20041006230239.0201bd40@box.tin.it
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi guys,

I just discussed about my problem on IRC. I am building a Web usage
mining system based on Linux, PostgreSQL and C++ made up of an OLTP
database which feeds several and multi-purpose data warehouses about users'
behaviour on HTTP servers.

I modelled every warehouse using the star schema, with a fact table and
then 'n' dimension tables linked using a surrogate ID.

Discussing with the guys of the chat, I came up with these conclusions,
regarding the warehouse's performance:

1) don't use referential integrity in the facts table
2) use INTEGER and avoid SMALLINT and NUMERIC types for dimensions' IDs
3) use an index for every dimension's ID in the fact table

As far as administration is concerned: run VACUUM ANALYSE daily and
VACUUM FULL periodically.

Is there anything else I should keep in mind?

Also, I was looking for advice regarding hardware requirements for a
data warehouse system that needs to satisfy online queries. I have indeed
no idea at the moment. I can only predict 4 million about records a month
in the fact table, does it make sense or not? is it too much?

Data needs to be easily backed up and eventually replicated.

Having this in mind, what hardware architecture should I look for? How
many hard disks do I need, what kind and what RAID solution do you suggest
me to adopt (5 or 10 - I think)?

Thank you so much,
-Gabriele
--
Gabriele Bartolini: Web Programmer, ht://Dig & IWA/HWG Member, ht://Check
maintainer
Current Location: Prato, Toscana, Italia
angusgb(at)tin(dot)it | http://www.prato.linux.it/~gbartolini | ICQ#129221447
> "Leave every hope, ye who enter!", Dante Alighieri, Divine Comedy, The
Inferno

Attachment Content-Type Size
unknown_filename text/plain 169 bytes

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Paul Ramsey 2004-10-06 22:26:47 Re: The never ending quest for clarity on shared_buffers
Previous Message Tom Lane 2004-10-06 20:20:03 Re: sequential scan on select distinct