Quick Links

Re: Archive to secondary Postgres instance

From:	Evan Bauer <evanbauer(at)mac(dot)com>
To:	David Rydzewski <david(dot)rydzewski(at)gmail(dot)com>
Cc:	pgsql-admin(at)lists(dot)postgresql(dot)org
Subject:	Re: Archive to secondary Postgres instance
Date:	2018-09-01 14:29:50
Message-ID:	59930584-CF26-46B2-A4C9-B1AE77ED8ECC@mac.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-admin

David,

Your approach #1 is one that has been proven for years in both commerce and securities trading applications. The pitfalls in approach #2 have always made it too problematic for me — regardless of the DBMS involved.

A third approach is to use a change data capture (CDC) tool. These tools manage rules-based asymmetric replication between databases — it basically provides your option #1 with a logical definition of the transfer, provides both seeding and catchup capabilities, but without having to manage a ton of replication scripts. I’ve used the approach for large-scale financial services applications where the “trade” database of active transactions has to be incredibly fast (and therefore small with short-lived rows), and the "trade-history” database needs to maintain years of data for compliance and analysis. The asymmetric publisher to subscriber relationship allows for all kinds of interesting indexing and aggregation through materialized views in the history database that you could never afford in the “live” OLTP database.

Talend is the commercial open source leader in this area for database-to-database CDC replication; my limited experience with it (though not with PostgreSQL) has been good. Debezium is based on Apache Kafka, looks promising, but requires both an extension for PostgreSQL publishers and a custom application on the subscriber side. There are a number of commercial products in this space as well. IBM’s InfoSphere CDC product is the one I know best, but it is both proprietary and (when I last looked) had limited PostgreSQL support. There is a list of other alternatives on the PostgreSQL wiki at: https://wiki.postgresql.org/wiki/Ecosystem:Replication

Cheers,

- Evan

Evan Bauer
eb(at)evanbauer(dot)com
+1 646 641 2973
Skype: evanbauer

> On Aug 31, 2018, at 21:57, David Rydzewski <david(dot)rydzewski(at)gmail(dot)com> wrote:
>
> I have a postgres database that is doubling in size each year and contains many online orders that were delivered long ago and are now dormant in the database.
>
> My desire is to offload these dormant orders to a secondary Postgres instance, so that the information is still available for viewing purchase history, while keeping the active orders in the main, and eventually smaller Postgres instance.
>
> These orders live in a highly normalized model that spans over 30 tables.
>
> I have thought of two approaches to handle this "archiving" use case.
>
> One approach is pull based. As orders naturally go dormant, execute custom SQL to move this information into the secondary postgres instance. A separate process would then purge this order information from the main database after a period of time (e.g., custom deletes). The upside is that the process is fairly lightweight and straightforward. The downside is you have to maintain two data models and the SQL scripts whenever the data model changes.
>
> Second approach is logical replication. The upside is that you don't have to maintain two data models separately. The downside is that it becomes very tricky to differentiate between a "normal" delete, which you would want to apply on the secondary instance versus an "archive" delete which you wouldn't want to apply on the secondary. It could also get complex when replication fails and you have to re-initiate the stream.
>
> My question is whether anyone else has solved this and if so, how? I'm leaning toward pull-based, because replication doesn't feel right with the complexity around "selective" deletes, and other replication failure scenarios.
>
> Thanks!
>
>
>
>
>

In response to

Archive to secondary Postgres instance at 2018-09-01 01:57:56 from David Rydzewski

Browse pgsql-admin by date

	From	Date	Subject
Next Message	Tom Lane	2018-09-01 14:43:55	Re: pg_dump and disk full
Previous Message	Shridhar Bichchal	2018-09-01 13:32:39	Remove my email Id