From: | Alex Grund <st(dot)helldiver(at)googlemail(dot)com> |
---|---|
To: | pgsql-sql(at)postgresql(dot)org |
Subject: | Question regarding modelling for time series |
Date: | 2012-09-04 20:16:05 |
Message-ID: | CACjun4+niDs1FfZYc4PQ7hH+XepSVu_oLHBdXdkOhjgsnOeW-A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
Hi there,
I want to use a database for storing economic time series.
An economic time series can be thought of as something like this:
NAME | RELEASE_DATE | REPORTING_DATE | VALUE
----------------+--------------+----------------+-------
Unemployment US | 2011/01/01 | 2010/12/01 | xxx
Unemployment US | 2011/02/01 | 2011/01/01 | xxx
Unemployment US | 2011/03/01 | 2011/02/01 | xxx
The release date is the date on which the data provider published the
value and the reporting date is the date to which the value refers
(read: In Dec, 2010 the unemployment was X but this has not been known
until 2011/01/01).
However, that's not the whole story. On each "release date" not only
ONE value is released but in some cases the values for previous
reporting_dates are changed.
So, the table could read like this:
NAME | RELEASE_DATE | REPORTING_DATE | VALUE
----------------+--------------+----------------+-------
Unemployment US | 2011/01/01 | 2010/12/01 | xxx
Unemployment US | 2011/01/01 | 2010/11/01 | xxx
Unemployment US | 2011/01/01 | 2010/10/01 | xxx
Unemployment US | 2011/02/01 | 2010/10/01 | xxx
Unemployment US | 2011/02/01 | 2010/11/01 | xxx
Unemployment US | 2011/02/01 | 2010/12/01 | xxx
Unemployment US | 2011/02/01 | 2011/01/01 | xxx
[...]
So, there are now mainly three questions to be answered:
1) "get me the time series [reporting_date, value] of unemployment as
it is now seen", so give all reporting_date,value tuples with the most
recent release_date.
2) "get me the time series [reporting_date, value] as it was
published/known to the market", so that means, in this case, give this
list:
Unemployment US | 2011/01/01 | 2010/12/01 | xxx
Unemployment US | 2011/02/01 | 2011/01/01 | xxx
Unemployment US | 2011/03/01 | 2011/02/01 | xxx
3) the same as (1) but with one enhancement: if the most recent
release has a history of N month, but all releases has a history of
N+X month, the time series from the most recent release should be
delivered plus the older values (in terms of reporting_dates) from the
second most recent release plus the more older values from the third
most recent release and so on.
So, I thought of a relational data base model like that:
TABLE 'ts' (TimeSeries)
PK:id | name
TABLE 'rs' (ReleaseStages)
PK:id | FK:ts_id | release_date
TABLE 'r' (Releases)
PK:id | FK:rs_id | reporting_date | value
Is this an appropriate model?
If yes, how could I answer the three questions above in terms of
SQL/Stored Procedures?
If no, what would you suggest?
And: If the datasets grow further, will be an RDBMS the right model
for time series storage? Any ideas on what else I could use?
Thank you very much!
--Alex
From | Date | Subject | |
---|---|---|---|
Next Message | Sergey Konoplev | 2012-09-05 05:36:59 | Re: Question regarding modelling for time series |
Previous Message | Sergey Konoplev | 2012-09-04 09:56:49 | Re: Need to Iterate the record in plpgsql |