From: | Gabriele Bartolini <angusgb(at)tin(dot)it> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | Data warehousing requirements |
Date: | 2004-10-06 21:36:05 |
Message-ID: | 6.1.2.0.2.20041006230239.0201bd40@box.tin.it |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hi guys,
I just discussed about my problem on IRC. I am building a Web usage
mining system based on Linux, PostgreSQL and C++ made up of an OLTP
database which feeds several and multi-purpose data warehouses about users'
behaviour on HTTP servers.
I modelled every warehouse using the star schema, with a fact table and
then 'n' dimension tables linked using a surrogate ID.
Discussing with the guys of the chat, I came up with these conclusions,
regarding the warehouse's performance:
1) don't use referential integrity in the facts table
2) use INTEGER and avoid SMALLINT and NUMERIC types for dimensions' IDs
3) use an index for every dimension's ID in the fact table
As far as administration is concerned: run VACUUM ANALYSE daily and
VACUUM FULL periodically.
Is there anything else I should keep in mind?
Also, I was looking for advice regarding hardware requirements for a
data warehouse system that needs to satisfy online queries. I have indeed
no idea at the moment. I can only predict 4 million about records a month
in the fact table, does it make sense or not? is it too much?
Data needs to be easily backed up and eventually replicated.
Having this in mind, what hardware architecture should I look for? How
many hard disks do I need, what kind and what RAID solution do you suggest
me to adopt (5 or 10 - I think)?
Thank you so much,
-Gabriele
--
Gabriele Bartolini: Web Programmer, ht://Dig & IWA/HWG Member, ht://Check
maintainer
Current Location: Prato, Toscana, Italia
angusgb(at)tin(dot)it | http://www.prato.linux.it/~gbartolini | ICQ#129221447
> "Leave every hope, ye who enter!", Dante Alighieri, Divine Comedy, The
Inferno
Attachment | Content-Type | Size |
---|---|---|
unknown_filename | text/plain | 169 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Paul Ramsey | 2004-10-06 22:26:47 | Re: The never ending quest for clarity on shared_buffers |
Previous Message | Tom Lane | 2004-10-06 20:20:03 | Re: sequential scan on select distinct |