Re: Homegrown Data Warehouse

From: Keith <keith(at)keithf4(dot)com>
To: Binand Sethumadhavan <binand(at)gmail(dot)com>
Cc: PGSQL-Novice <pgsql-novice(at)postgresql(dot)org>
Subject: Re: Homegrown Data Warehouse
Date: 2016-07-12 14:01:05
Message-ID: CAHw75vsFtXTADfz3vS7MGqjV1UHvbJgiaoK210-Cp9YNadKepw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

On Mon, Jul 11, 2016 at 11:01 PM, Binand Sethumadhavan <binand(at)gmail(dot)com>
wrote:

> We have a postgresql 9.2 instance pair (master+slave) supporting our
> online application.
> We have a postgresql 9.5 instance which contains a dump of the above
> database.
>
> The primary purpose of the dump database is to facilitate analysis. It
> is updated from the slave by way of a perl script and psql. Master has
> archival policies etc. in place but dump contains complete historic
> data.
>
> My problem is that when the dump job runs, queries pause or when heavy
> analysis queries run, dump process pauses. This creates significant
> problems. One particular use-case has a production application picking
> analyzed data from the dump database and running a campaign based on
> that.
>
> It is basically a hack to implement data warehouse, ETL, datamarts,
> analysis & reporting and campaign management via a set of
> perl/shell/sql scripts. It used to work fine, but now with analysis
> workloads increasing it is creating problems.
>
> I'd like to know where to go next; because at the moment I'm out of
> ideas. I do not have the budget for a paid solution.
>
> Basically, I'm looking for suggestions on architecture and
> implementation of such a system. If there are any web references
> please mention them too.
>
> TIA,
>
> Binand
>
>
> --
> Sent via pgsql-novice mailing list (pgsql-novice(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-novice
>

We use a tool that I've been working on called mimeo to do logical
(per-table) replication to move data to a data warehouse system like you
have. Mimeo has several replication methods available, the most commonly
used being trigger based replication (DML replay). Some other replication
methods that can be more efficient are available as well, but only work in
specific situations.

https://github.com/omniti-labs/mimeo

There's another extension in the works called pglogical, which uses the
newer logical WAL streaming method available in 9.4. It's much more
efficient, but not quite as flexible and also requires all systems involved
be on the same version.

https://2ndquadrant.com/en/resources/pglogical/

Keith
https://www.keithf4.com/

In response to

Browse pgsql-novice by date

  From Date Subject
Next Message Nigel Straightgrain 2016-07-13 03:54:42 How to upgrade from PostgreSQL v9.1.2 to v9.5.3?
Previous Message Kenneth Marshall 2016-07-12 12:59:16 Re: Homegrown Data Warehouse