Re: redundant fields in table for "performance optimizations"

From: Ondrej Ivanič <ondrej(dot)ivanic(at)gmail(dot)com>
To: Menelaos PerdikeasSemantix <mperdikeas(dot)semantix(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: redundant fields in table for "performance optimizations"
Date: 2012-08-21 22:51:57
Message-ID: CAM6mieLvwRBmL6RdE9LggOMhq94aQDQzwY87Boj5yAB0t-_2Mg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

On 22 August 2012 07:07, Menelaos PerdikeasSemantix
<mperdikeas(dot)semantix(at)gmail(dot)com> wrote:
> Let's say you have a father-child (or master-detail if you wish) hierarchy
> of tables of not just 2 levels, but, say, 5 levels.
> E.g. tables A, B, C, D and E organized in successive 1-to-N relationships:
>
> A ----1-to-N-----> B
> B ----1-to-N-----> C
> C ----1-to-N-----> D
> D ----1-to-N-----> E
>
> with appropriate foreign keys:
>
> * from E to D
> * from D to C
> * from C to B
> * from B to A
>
> This is normalized so far. Now assume that it is the case than in some
> queries on table E you also need to report a field that only exists on table
> A. This will mean a JOIN between five tables: E, D, C, B and A. Some
> questions follow:
>
> [1] assuming tables having a number of rows in the order of 100,000, after
> how many levels of depth would you feel justified to depart from the
> normalized schema and introduce some redundancy to speed up the queries?
>
> [3] do you feel this is a legitimate concern in a modern PostgreSQL database
> running on high end (200,000 USD) hardware and serving no more than 1000
> concurrent users with table sizes at the lowest (more detailed) level of the
> hierarchy in the order of a few tens of millions of rows at the most and
> dropping by a factor of 20 for each level up ?

I would ask different question(s): how "static" that tree structure is
and what kind of queries do you want to run:
- father-child: easy to understand; add new node; change leaf node;
hard to run some count(*) queries; and get hierarchy (CTEs are help
full)
- nested sets: pailful to move nodes around (even add new node); easy
to get tree subsets; ...

Anyway, I've found this summary:
http://stackoverflow.com/questions/4048151/what-are-the-options-for-storing-hierarchical-data-in-a-relational-database
when I was googling for Joe Celko's Trees and Hierarchies book.

--
Ondrej Ivanic
(ondrej(dot)ivanic(at)gmail(dot)com)

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Michael Clark 2012-08-22 03:09:31 Re: Problems with timestamp with time zone and old dates?
Previous Message Steve Crawford 2012-08-21 21:50:59 Re: Problems with timestamp with time zone and old dates?