From: | John McKown <john(dot)archie(dot)mckown(at)gmail(dot)com> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Design ? table vs. view? |
Date: | 2014-07-15 12:33:56 |
Message-ID: | CAAJSdjjxMFEJLoeBEEWzTp6LZChVVwC0sdK8io9r=bc-rJaiTQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
I have a table which has some "raw" data in it. By "raw", I mean it is
minimally processed from a log file. Every week, I update this table by
processing the weekly log using awk to create a "psql script" file which
looks similar to:
COPY rawdata FROM STDIN;
.... lines created by awk script
\.
The table schema is:
Table "jobrun.rawdata"
Column | Type |
----------+---------------+
lpar | character(4) |
yyddd | character(5) |
timedata | character(11) |
jobid | character(8) |
msgid | character(7) |
jobname | character(8) |
Now, this data is not really very useful in its raw form. So I "process" it
via a view:
View "jobrun.rundata"
Column | Type |
---------+--------------------------+
lpar | character(4) |
msgid | character(7) |
jobname | character(8) |
jobid | character(8) |
msgtime | timestamp with time zone |
View definition:
SELECT rawdata.lpar,
rawdata.msgid,
rawdata.jobname,
rawdata.jobid,
to_timestamp((rawdata.yyddd::text || ' '::text) ||
rawdata.timedata::text, 'YYDDD HH24:MI:SS.MS'::text) AS msgtime
FROM rawdata;
My question is this: If I do a number of SELECTs on the "rundata" table.
So, would it be worth while to make this a table in itself? The plus of a
view is that I don't need to worry about updates. And I still have the
"raw" data around. In reality, this is just the first VIEW. I create three
other views. Two views are to "subset" the data based on the contents of
the "msgid" value (there are only 2 possibilities at present: 'START' and
'END'). The final view, which is my actual information is a FULL OUTER JOIN
of the START and END subset, based on lpar,jobname, and jobid:
View "jobrun.runinfo"
Column | Type |
----------+--------------------------+
lpar | character(4) |
jobname | character(8) |
jobid | character(8) |
runstart | timestamp with time zone |
runend | timestamp with time zone |
View definition:
SELECT COALESCE(a.lpar, b.lpar) AS lpar,
COALESCE(a.jobname, b.jobname) AS jobname,
COALESCE(a.jobid, b.jobid) AS jobid,
a.msgtime AS runstart,
b.msgtime AS runend
FROM runstart a
FULL JOIN runend b ON a.lpar = b.lpar AND a.jobname = b.jobname AND
a.jobid = b.jobid;
So the overhead may be quite high, because to SELECT from RUNINFO,
PostgreSQL must realize all four views.
I appreciate your thoughts on if this is OK, given that performance is
currently acceptable. Mainly because this work is basically only done one a
week, on Sundays. And I don't do it myself, it is done via a scheduler (not
cron, but similar) which runs some scripts.
--
There is nothing more pleasant than traveling and meeting new people!
Genghis Khan
Maranatha! <><
John McKown
From | Date | Subject | |
---|---|---|---|
Next Message | basti | 2014-07-15 12:52:12 | Re: php password authentication failed for user ... |
Previous Message | Marc Mamin | 2014-07-15 11:26:47 | FW: operator is not unique: smallint[] @> smallint[] You might need to add explicit type casts (!) |