Quick Links

Re: Streaming replication and sharding

From:	Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To:	"Tiemo Kieft EXTERN" <t(dot)kieft(at)infinitesimal(dot)nl>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Streaming replication and sharding
Date:	2013-02-18 08:00:33
Message-ID:	A737B7A37273E048B164557ADEF4A58B057B470A@ntex2010a.host.magwien.gv.at
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Tiemo Kieft wrote:
> We are developing an application that uses various web analytics packages (like Google Analytics) to
> run analyses on. We are currently in closed beta stadium where we don't have a lot of data in the
> database, but at some point it will grow considerably.
>
> We basically have two different sets of data, on the one hand we have raw metrics from the datasource,
> and on the other hand we have account and meta information. The former can be re-downloaded at any
> time, and will grow to quite large sizes. The latter set is the one that we really care about, and
> don't want to risk losing.
>
> Currently we plan on using streaming replication to replicate all data to at least one slave, for the
> near future this will do, since we can run some of the large (read-only) aggregation queries on the
> slave database. In the future the dataset might grow to the point where we need to start thinking
> about sharding. The analytics data can be sharded on a per-customer basis, and doesn't have to be
> replicated.
>
> Since Postgres doesn't support per-table streaming replication (as far as I can tell), the only
> solution would be to run two separate instances of postgres per server. One instance is replicated to
> all servers, and will contain account and other important information. The other instance is used to
> store analytics data. Is this a viable way of solving this problem, or are we overlooking something?
>
> The problem is not really immediate, as the dataset is currently small enough to fit on one machine
> (and replicated to a second), just want to be future proof, and get this solved before the problems
> start.

The problems I see with distributing your data across
several PostgreSQL clusters is that they become disconnected.

It will become much more difficult to keep them consistent:

You cannot have referential integrity, and if a database restore
is needed, you have to make extra provisions that you can restore
your system to a consistent state across all clusters.

You also lose the ability to join between tables that are
distributed across different databases. This can be a
perdormance problem, particularly in an OLAP scenario.

That's all I can think of at the moment.

Yours,
Laurenz Albe

In response to

Streaming replication and sharding at 2013-02-17 08:02:29 from Tiemo Kieft

Responses

Re: Streaming replication and sharding at 2013-02-18 20:00:53 from René Romero Benavides
Re: Streaming replication and sharding at 2013-02-18 21:01:28 from René Romero Benavides

Browse pgsql-general by date

	From	Date	Subject
Next Message	Thomas Kellerer	2013-02-18 08:17:07	Re: Visual query builder for PosgreSQL?
Previous Message	Michael Harris	2013-02-18 03:31:32	Re: Hot Standby has PANIC: WAL contains references to invalid pages