From: | "Leif B(dot) Kristensen" <leif(at)solumslekt(dot)org> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: a newbie question on table design |
Date: | 2008-02-15 14:43:57 |
Message-ID: | 200802151543.57443.leif@solumslekt.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Friday 15. February 2008, fdu(dot)xiaojf(at)gmail(dot)com wrote:
>Hi all,
>
>I have a large sdf file with many records of molecules and associated
>data items and I want to save them in a PostgreSQL database. There are
>about less than 40 data items for every molecule(the names of the data
>items fore each molecule are the same, but values differ). The number
>of molecules may exceed 20 million.
>
>Now I have come up with two ways to construct the table:
>
>1) a table with about 40 columns, every data item has its
> corresponding column, and one molecule corresponds to one row in the
> table.
>
>This is direct and simple. The drawbacks is if I want to add more data
>types to the database, I have to modify the structure of the table.
>
>2) a table with just 3 columns:
>
> CREATE TABLE mytable(
> id serial,
> data_name text,
> data_value text
> );
>
>Then a single molecule will corresonds to about 40 rows in the
> database.
This is a sound concept, but I'd rather store the data_name in a
separate table with an integer key, and replace data_name in mytable
with a data_name_fk REFERENCES data_names (data_name_id). That's just
Occam's Razor applied to database design, aka first normal form.
You'd probably store the name of the molecule in a third table. Then you
have a model very similar to the classic 'book database' where a book
can have multiple authors, and an author can have multiple books. There
are examples for this design all over the place.
>If I need to add more data types to the table, I just need to add new
>rows with new "data_name" column values. The drawback of this table is
>it has too many rows(40 times of the former one) and waste a lot
> space.
>
>Which one is better, or there are some other smart ways ?
>
>I have another question. Since the data I need to save is huge, is it
>appropriate that I save the data value in compressed format ?
That sounds a lot like premature optimization. Postgres is actually
quite good at compacting data natively.
--
Leif Biberg Kristensen | Registered Linux User #338009
http://solumslekt.org/ | Cruising with Gentoo/KDE
My Jazz Jukebox: http://www.last.fm/user/leifbk/
From | Date | Subject | |
---|---|---|---|
Next Message | Oleg Bartunov | 2008-02-15 14:52:12 | Re: a newbie question on table design |
Previous Message | Douglas McNaught | 2008-02-15 14:41:10 | Re: Are indexes blown? |