From: | Paul Jungwirth <pj(at)illuminatedcomputing(dot)com> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: journaling / time travel |
Date: | 2016-09-19 16:26:36 |
Message-ID: | a6e39c04-969d-993c-e5a1-8aca85d68761@illuminatedcomputing.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On 09/19/2016 08:48 AM, Willy-Bas Loos wrote:
> Since records can
> be changed afterwards, it has been argued that we should have
> "journaling", meaning that every change to the data is saved in a
> separate schema that holds a "journaling" copy of each table
I don't think this is especially unusual. I've worked on similar
projects that maintain history for regulatory reasons. I can't speak to
your legal situation, but I don't think keeping history is a strange choice.
Re the security argument: it seems to me it depends on whether you can
restrict access to the journal while providing access to the normal
tables. I guess journaling would help for some threats but not others,
right? If regular users can't touch the journal, then I agree keeping
one could make sense.
Are you saying the journaling portion is 624GB on top of 1.1TB? Or that
of the 1.1TB, 624GB of it is from journaling? Either way it doesn't seem
like a massive cost to me.
I don't think PITR is an appropriate solution to keeping a 10-year
history of changes.
It sounds like you're not looking for a green-field solution, but just
trying to get perspective on what others are doing. Some resources for
this that might help you:
https://www.youtube.com/watch?v=TRgni5q0YM8
https://github.com/arkhipov/temporal_tables (aka
http://pgxn.org/dist/temporal_tables/)
http://www.cs.arizona.edu/~rts/tdbbook.pdf (also available in print)
https://www.amazon.com/Bitemporal-Data-Practice-Tom-Johnston/dp/0124080677
The two books contain multiple approaches to storing history, each with
tradeoffs for ease-of-use, disk space, etc. Reading them might be
overkill for you. If you want to choose one, I'd read Snodgrass. The
first few chapters are tedious, but it gets better.
Temporal databases are in a funny in-between zone where there is lots of
research, but standard tools are fairly underdeveloped. Postgres
recently added range types and exclusion constraints, which are
important primitives for building a temporal system, but it is still a
long way from SQL:2011, and SQL:2011 is itself a long way from
everything you might want. One thing that seems lacking to me, even in
the research, is how to handle DDL changes. You should be glad that you
only care about audit history and not subject history too, because going
bi-temporal is where you really cross over into lack of available tools,
outside of a few commercial offerings. (Teradata has temporal support,
using a Snodgrass-like approach that pre-dates the standard.)
Paul
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Janes | 2016-09-19 16:41:33 | Re: performance problems with bulk inserts/updates on tsrange with gist-based exclude constrains |
Previous Message | Willy-Bas Loos | 2016-09-19 16:04:25 | Re: journaling / time travel |