Re: Normalization or Performance

From: Richard Huxton <dev(at)archonet(dot)com>
To: Alvaro Nunes Melo <al_nunes(at)atua(dot)com(dot)br>
Cc: Pgsql-Performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Normalization or Performance
Date: 2004-12-02 18:34:56
Message-ID: 41AF6050.9030302@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Alvaro Nunes Melo wrote:
> Hi,
>
> Before writing this mail, I'd researched a little about this topic,
> and got some opinions from guys like Fabien Pascal, who argues that
> logical design should be separated from physical design, and other
> sources. As this is not fact, I'm writing to you guys, that make
> things work in real world.

I believe he's right. Or at least that you should only compromise your
logical design once it becomes absolutely necessary due to physical
limitations.

> We started our first big (for our company standards) project always
> thinking in normalization. But once we imported legacy data into the
> DB, things got harder.
>
> One example is the clients status. A client might be active, inactive
> or pending (for many reasons). We store all the status a client have
> since it is in the system. To check what is the actual status of a
> client, we get the last status from this historical status table.
> This take a considerable time, so our best results were achieved
> building a function that checks the status and indexing this
> function. The problem is that indexed functions mus bu immutable, so
> as you can figure, if the status change after the creation of the
> index, the retunr of the function is still the same.
>
> What do you suggest for situations like this? Should I add a field to
> clients table and store its actual status, keep storing data in the
> historical table an control its changes with a trigger?

Trigger + history table is a common solution, it's easy to implement and
there's nothing non-relational about it as a solution.

> There are other situations that are making things difficult to us.
> For example, one query must return the total amount a client bought
> in the last 90 days. It's taking too long, when we must know it for
> many clients, many times. So should I create summarization tables to
> store this kind of stuff, update it with a trigger in daily basis
> (for example), and solve this problem with one join?

One solution I use for this sort of thing is a summary table grouped by
date, and accurate until the start of today. Then, I check the summary
table and the "live" table for todays information and sum those.

> Our database is not that big. The larger table has about 7.000.000
> rows. About 50.000 clients, half of them active. All that I'd point
> out above uses indexes for queries, etc. But even with this it's not
> been fast enough. We have a Dell server for this (I know, the Dell
> issue), a Dual Xeon 2.8, SCSI HD, 1 GB mem. Do we need better
> hardware for our system?

Swap one of your processors for more RAM and disks, perhaps.

--
Richard Huxton
Archonet Ltd

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Merlin Moncure 2004-12-02 18:39:32 Re: Alternatives to Dell?
Previous Message Andrew McMillan 2004-12-02 17:58:15 Re: pg_restore taking 4 hours!