From: | "Gurupartap Davis" <partap(at)yahoo(dot)com> |
---|---|
To: | "Francisco Reyes" <lists(at)natserv(dot)com> |
Cc: | "pgsql General List" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: pgsql and large tables |
Date: | 2001-11-16 01:01:36 |
Message-ID: | 004201c16e3a$3d915de0$0f00a8c0@marlows |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Sheesh...I think I see what you're saying about normalization. I've never
had a formal db class, but it looks like you're talking about removing all
the redundancy from the database...and there's a lot in this one(!)
I could have a forecast table like so:
id serial primary key,
model_id references model (id),
yearmoda date,
modelruntime smallint
Then a temperature table:
forecast_id references forecast (id),
stn_id references station (id)
modelhr smallint,
value numeric (6,2)
repeat for relative humidity, dewpoint, etc...
zhr,zday, and modelrundate in the current table are redundant, as they are
derivitive of the date and modelhr....
It looks like it would save a hella lot of disk space...probably over 50%,
seeing as how certain variables (dwpt, rh, lpre) only show up in a fraction
of the records...(ie, most records have NULL for these fields)
Dang...I don't know if my employer will let me make a change this radical,
though. We would have to redo every single query in about 50 scripts.
Hrmmm. I'm a programmer, not a DB admin...but my employer is too cheap
(errr, sorry..."thrifty") to hire a real DB admin :-/
>
> > I'm migrating this table from an existing mysql installation...
> > This is what it looks like right now:
> > Table "forecast"
> > Attribute | Type | Modifier
> > --------------+-----------------------+----------
> > zhr | smallint |
> > zday | smallint |
> > model | character varying(32) |
> > temp | numeric(6,2) |
> > modelhr | smallint | not null
> > modelruntime | smallint | not null
> > modelrundate | smallint |
> > stn | character(4) | not null
> > rh | numeric(6,2) |
> > wdsp | numeric(6,2) |
> > wddir | character varying(2) |
> > dwpt | numeric(6,2) |
> > lpre | numeric(6,2) |
> > yearmoda | date | not null
> >
> > It's a table for weather forecasts, a record is identified uniquely by
> > (model, stn, yearmoda, modelruntime, modelhr) although I will rarely
have a
> > query that specifies all of those fields.
We need to run the db 24x7. Data comes in all day long, one model run at a
time. A model run is anywhere from 10-40 hours interpolated over about 1800
weather stations. All we do is add data (and, occasionally, re-insert data
that was found corrupt)...no deleting, though. Would you recommend doing a
vacuum analyze after every model run, or would once a day be sufficient?
> You also mentioned issues with downtime. Is this DB going to be used
> 24x7? You need to do vacuum analyze at least after every big update.
>
> How often will your data be updated? Once data is loaded will it be
> changed at all?
>
From | Date | Subject | |
---|---|---|---|
Next Message | Doug McNaught | 2001-11-16 01:39:04 | Re: pgsql and large tables |
Previous Message | Doug McNaught | 2001-11-16 00:53:22 | Re: Database server crash ! URGENT ! |