Design of live and archive database

From: Genc, Ömer <Oemer(dot)Genc(at)iais(dot)fraunhofer(dot)de>
To: "pgsql-cluster-hackers(at)postgresql(dot)org" <pgsql-cluster-hackers(at)postgresql(dot)org>
Subject: Design of live and archive database
Date: 2015-07-09 11:52:20
Message-ID: 3eff99123d744043a47cd590f1a80d1c@e2k13-ms1.iais.fraunhofer.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-cluster-hackers

Hey there,

i am pretty new to postgres, so please bear with me.
Currently I am working on a project which serializes data into a postgres database.
This is a real-time streaming data visualization system with a high frequency of incoming data, which need to be stored in the database.
You can think about continuously 1000 inserts per second. This data needs to be stored for ever, since it should be possible
to restore the system to any state is has ever been in at any time.
However during runtime only the data for the past hour is frequently requested,
which is not very much compared to the overall data size.

As you can image, the database becomes very large in size over time and queries are getting slower and slower. I already
implemented partitioning using time-based approach for corresponding tables. This approach worked and were satisfactory at a first glance.
However since this is an always running system, there is very much data (in form of old partition tables) laying around cluttering the database.
Now I decided to take a different approach. I am trying to separate between two database clusters, one live cluster and one archive cluster.
The live cluster will only maintain the last hour of data, the archive cluster will keep all the data of all time.

Is there anyone who ever implemented a similar database architecture? Can anyone give me an advice, if this is a good approach?
And if so, what would be the best way to implement this?

The approach Im going to implement is:
Currently I have one application server which handles all queries against the database. Now I thought about implementing a multiplexer, which instead of inserting into on database also inserts into the second archive database. While triggers on the live system would remove data which is older than on hour, the archive database would partition data on a monthly approach.

I simply don't want to miss an obvious solution to this, so any advice would be greatly appreciated.

Cheers,
Oemer

Browse pgsql-cluster-hackers by date

  From Date Subject
Next Message Rainer Brestan 2015-08-17 10:16:50 [pg_rewind] Request for command line switch to prevent removal of stored procedures
Previous Message אביאל בוסקילה 2015-06-18 14:44:03 Implementing PostgreSQL in High Availability