Re: a newbie question on table design

From: "Leif B(dot) Kristensen" <leif(at)solumslekt(dot)org>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: a newbie question on table design
Date: 2008-02-15 14:43:57
Message-ID: 200802151543.57443.leif@solumslekt.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Friday 15. February 2008, fdu(dot)xiaojf(at)gmail(dot)com wrote:
>Hi all,
>
>I have a large sdf file with many records of molecules and associated
>data items and I want to save them in a PostgreSQL database. There are
>about less than 40 data items for every molecule(the names of the data
>items fore each molecule are the same, but values differ). The number
>of molecules may exceed 20 million.
>
>Now I have come up with two ways to construct the table:
>
>1) a table with about 40 columns, every data item has its
> corresponding column, and one molecule corresponds to one row in the
> table.
>
>This is direct and simple. The drawbacks is if I want to add more data
>types to the database, I have to modify the structure of the table.
>
>2) a table with just 3 columns:
>
> CREATE TABLE mytable(
> id serial,
> data_name text,
> data_value text
> );
>
>Then a single molecule will corresonds to about 40 rows in the
> database.

This is a sound concept, but I'd rather store the data_name in a
separate table with an integer key, and replace data_name in mytable
with a data_name_fk REFERENCES data_names (data_name_id). That's just
Occam's Razor applied to database design, aka first normal form.

You'd probably store the name of the molecule in a third table. Then you
have a model very similar to the classic 'book database' where a book
can have multiple authors, and an author can have multiple books. There
are examples for this design all over the place.

>If I need to add more data types to the table, I just need to add new
>rows with new "data_name" column values. The drawback of this table is
>it has too many rows(40 times of the former one) and waste a lot
> space.
>
>Which one is better, or there are some other smart ways ?
>
>I have another question. Since the data I need to save is huge, is it
>appropriate that I save the data value in compressed format ?

That sounds a lot like premature optimization. Postgres is actually
quite good at compacting data natively.
--
Leif Biberg Kristensen | Registered Linux User #338009
http://solumslekt.org/ | Cruising with Gentoo/KDE
My Jazz Jukebox: http://www.last.fm/user/leifbk/

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Oleg Bartunov 2008-02-15 14:52:12 Re: a newbie question on table design
Previous Message Douglas McNaught 2008-02-15 14:41:10 Re: Are indexes blown?