From: | David G Johnston <david(dot)g(dot)johnston(at)gmail(dot)com> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Design ? table vs. view? |
Date: | 2014-07-15 13:46:05 |
Message-ID: | 1405431965101-5811589.post@n5.nabble.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
John McKown wrote
> I have a table which has some "raw" data in it. By "raw", I mean it is
> minimally processed from a log file. Every week, I update this table by
> processing the weekly log using awk to create a "psql script" file which
> looks similar to:
>
> COPY rawdata FROM STDIN;
> .... lines created by awk script
> \.
>
> The table schema is:
> Table "jobrun.rawdata"
> Column | Type |
> ----------+---------------+
> lpar | character(4) |
> yyddd | character(5) |
> timedata | character(11) |
> jobid | character(8) |
> msgid | character(7) |
> jobname | character(8) |
>
> Now, this data is not really very useful in its raw form. So I "process"
> it
> via a view:
>
> View "jobrun.rundata"
> Column | Type |
> ---------+--------------------------+
> lpar | character(4) |
> msgid | character(7) |
> jobname | character(8) |
> jobid | character(8) |
> msgtime | timestamp with time zone |
> View definition:
> SELECT rawdata.lpar,
> rawdata.msgid,
> rawdata.jobname,
> rawdata.jobid,
> to_timestamp((rawdata.yyddd::text || ' '::text) ||
> rawdata.timedata::text, 'YYDDD HH24:MI:SS.MS'::text) AS msgtime
> FROM rawdata;
>
> My question is this: If I do a number of SELECTs on the "rundata" table.
> So, would it be worth while to make this a table in itself? The plus of a
> view is that I don't need to worry about updates. And I still have the
> "raw" data around. In reality, this is just the first VIEW. I create three
> other views. Two views are to "subset" the data based on the contents of
> the "msgid" value (there are only 2 possibilities at present: 'START' and
> 'END'). The final view, which is my actual information is a FULL OUTER
> JOIN
> of the START and END subset, based on lpar,jobname, and jobid:
>
> View "jobrun.runinfo"
> Column | Type |
> ----------+--------------------------+
> lpar | character(4) |
> jobname | character(8) |
> jobid | character(8) |
> runstart | timestamp with time zone |
> runend | timestamp with time zone |
> View definition:
> SELECT COALESCE(a.lpar, b.lpar) AS lpar,
> COALESCE(a.jobname, b.jobname) AS jobname,
> COALESCE(a.jobid, b.jobid) AS jobid,
> a.msgtime AS runstart,
> b.msgtime AS runend
> FROM runstart a
> FULL JOIN runend b ON a.lpar = b.lpar AND a.jobname = b.jobname AND
> a.jobid = b.jobid;
>
> So the overhead may be quite high, because to SELECT from RUNINFO,
> PostgreSQL must realize all four views.
>
> I appreciate your thoughts on if this is OK, given that performance is
> currently acceptable. Mainly because this work is basically only done one
> a
> week, on Sundays. And I don't do it myself, it is done via a scheduler
> (not
> cron, but similar) which runs some scripts.
I would likely make "jobrun.runinfo" into a table while leaving
"jobrun.rawdata" as-is. I would have a function that populates "runinfo"
from "rawdata" that I would call after performing the copy to "rawdata".
There would be no views - unless you desire a view interface over "runinfo"
for API or permission reasons.
In 9.4 you can (probably) make "runinfo" an explicit MATERIALIZED VIEW and
perform REFRESH command to accomplish the same thing - though I am not
particularly familiar with the mechanics of that feature.
David J.
--
View this message in context: http://postgresql.1045698.n5.nabble.com/Design-table-vs-view-tp5811577p5811589.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.
From | Date | Subject | |
---|---|---|---|
Next Message | Adrian Klaver | 2014-07-15 14:00:06 | Re: php password authentication failed for user ... |
Previous Message | Néstor Boscán | 2014-07-15 13:07:29 | Re: Is there a way to get an update date for objects in pg_class |