Re: Strategy for materialisation and centralisation of data

From: Michael Stephenson <domehead100(at)gmail(dot)com>
To:
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Strategy for materialisation and centralisation of data
Date: 2020-05-21 21:06:48
Message-ID: CAHJZW0=zDHUhrKGAPYE-ekRT9FRmHfsASSnpHev+cC-5K9=bbQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

You might find Materialize interesting:

https://materialize.io/

https://youtu.be/zWSdkGq1XWk

On Thu, May 21, 2020 at 10:36 AM Rory Campbell-Lange <
rory(at)campbell-lange(dot)net> wrote:

> We have quite a few databases of type a and many of type b in a cluster.
> Both a and b types are fairly complex and are different solutions to a
> similar problem domain. All the databases are very read-centric, and all
> database interaction is currently through plpgsql with no materialised
> data.
>
> Some organisations have several type a and many type b databases, and
> need to query these in a homogeneous manner. We presently do this with
> many middleware requests or pl/proxy. An a or b type database belongs to
> 0 or 1 organisations.
>
> Making a and b generally the same would be a very big project.
> Consequently I'm discussing materialising a subset of data in a common
> format between the two database types and shipping that data to
> organisation databases. This would have the benefit of providing a
> common data interface and speeding up queries for all database types.
> Users would have faster queries, and it would be a big time saver for
> our development team, who presently have to deal with three quite
> different data APIs.
>
> Presently I've been thinking of using triggers or materialized views in
> each database to materialise data into a "matview" schema which is then
> shipped via logical replication to an organisation database when
> required. New columns in the matview schema tables would ensure replica
> identity uniqueness and allow the data to be safely stored in common
> tables in the organisation database.
>
> A few issues I foresee with this approach include:
>
> * requiring two to three times current storage for materialisation
> (the cluster is currently ~250GB)
>
> * having to have many logical replication slots
> (we sometimes suffer from pl/proxy connection storms)
>
> Commentary gratefully received,
> Rory
>
>
>
>
>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Rory Campbell-Lange 2020-05-21 21:19:21 Re: Strategy for materialisation and centralisation of data
Previous Message Zahir Lalani 2020-05-21 20:58:32 RE: Should I use JSON?