Quick Links

Re: join tables vs. denormalization by trigger

From:	Mark Lewis <mark(dot)lewis(at)mir3(dot)com>
To:	Walter Mauritz <waltermauritz(at)gmx(dot)at>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: join tables vs. denormalization by trigger
Date:	2007-09-04 19:18:00
Message-ID:	1188933480.5367.40.camel@archimedes
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

On Tue, 2007-09-04 at 20:53 +0200, Walter Mauritz wrote:
> Hi,
>
> I wonder about differences in performance between two scenarios:
>
> Background:
> Table A, ~50,000 records
> Table B, ~3,000,000 records (~20 cols)
> Table C, ~30,000,000 records (~10 cols)
>
> a query every 3sec. with limit 10
>
> Table C depends on Table B wich depends on Table A, int8 foreign key, btree index
>
> * consider it a read only scenario (load data only in night, with time for vacuum analyze daily)
> * im required to show records from Table C, but also with some (~5cols) info from Table B
> * where clause always contains the foreign key to Table A
> * where clause may contain further 1-10 search parameter
>
>
> Scenario A)
> simply inner join Table B + C
>
> Scenario B)
> with use of trigger on insert/update I could push the required information from table B down to table C.
> -> so i would only require to select from table C.
>
>
> My question:
> 1) From your experience ... how much faster (approximately) in percent do you regard Scenario B faster than A ?

You're assuming that B is always going to be faster than A, which
certainly isn't a foregone conclusion. Let's say that you average 10
bytes per column. In scenario A, the total data size is then roughly
3,000,000 * 20 * 10 + 30,000,000 * 10 * 10 = 3.6 GiB. In scenario B due
to your denormalization, the total data size is more like 30,000,000 *
30 * 10 = 9 GiB, or 2.5 times more raw data.

That's a lot of extra disk IO, unless your database will always fit in
memory in both scenarios.

Although you didn't provide enough data to answer with certainty, I
would go on the assumption that A is going to be faster than B. But
even if it weren't, remember that premature optimization is the root of
all evil. If you try A and it doesn't perform fast enough, then you can
always try B later to see if it works any better.

-- Mark Lewis

In response to

join tables vs. denormalization by trigger at 2007-09-04 18:53:21 from Walter Mauritz

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Carlo Stonebanks	2007-09-04 22:45:24	Performance on 8CPU's and 32GB of RAM
Previous Message	Bill Moran	2007-09-04 19:01:02	Re: About autovacuum