From: | " René Romero Benavides" <ichbinrene(at)gmail(dot)com> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Streaming replication and sharding |
Date: | 2013-02-18 20:00:53 |
Message-ID: | 3838932.rfQQDZ6cEG@beagle |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
I'm also interested in leveraging something like that, these are my thoughts:
- Have a master server with all the data
- Have the data partitioned vertically (inheritance, exclusion constrains,
etc)
- One synchronous slave
- N asynchronous slaves that feed from the synchronous slave (cascading
replication)
- Use plproxy to enforce that one instance serves queries of only one portion
of the data (pseudo-sharding). Also with plproxy is possible to parallelize
some queries.
Advantages:
+ Highly redundant
+ Lends itself to automatic failover to the syncrhonous slave
+ Read scalable
+ No synchronization conflicts among "shards"
+ Some queries could be parallelized
Disadvantages:
+ Plproxy works only with functions
+ Writes can be performed only in the master ( not write scalable)
+ Needs good communication infrastructure
+ Table structures need to be simple.
+ At some point (failover), you'll need to implement plproxy automatic re-
configuration
On Monday, February 18, 2013 08:00:33 AM Albe Laurenz wrote:
> Tiemo Kieft wrote:
>
> > We are developing an application that uses various web analytics packages
> > (like Google Analytics) to
run analyses on. We are currently in closed
> > beta stadium where we don't have a lot of data in the database, but at
> > some point it will grow considerably.
> >
> > We basically have two different sets of data, on the one hand we have raw
> > metrics from the datasource,
and on the other hand we have account and
> > meta information. The former can be re-downloaded at any time, and will
> > grow to quite large sizes. The latter set is the one that we really care
> > about, and don't want to risk losing.
> >
> > Currently we plan on using streaming replication to replicate all data to
> > at least one slave, for the
near future this will do, since we can run
> > some of the large (read-only) aggregation queries on the slave database.
> > In the future the dataset might grow to the point where we need to start
> > thinking about sharding. The analytics data can be sharded on a
> > per-customer basis, and doesn't have to be replicated.
> >
> > Since Postgres doesn't support per-table streaming replication (as far as
> > I can tell), the only
solution would be to run two separate instances of
> > postgres per server. One instance is replicated to all servers, and will
> > contain account and other important information. The other instance is
> > used to store analytics data. Is this a viable way of solving this
> > problem, or are we overlooking something?
> > The problem is not really immediate, as the dataset is currently small
> > enough to fit on one machine
(and replicated to a second), just want to
> > be future proof, and get this solved before the problems start.
>
>
> The problems I see with distributing your data across
> several PostgreSQL clusters is that they become disconnected.
>
> It will become much more difficult to keep them consistent:
>
> You cannot have referential integrity, and if a database restore
> is needed, you have to make extra provisions that you can restore
> your system to a consistent state across all clusters.
>
> You also lose the ability to join between tables that are
> distributed across different databases. This can be a
> perdormance problem, particularly in an OLAP scenario.
>
> That's all I can think of at the moment.
>
> Yours,
> Laurenz Albe
--
René Romero Benavides @iCodeiExist @PgsqlMx
Postgresql Tips en español para la comunidad de México e Hispanoamérica.
http://postgresql.org.mx
From | Date | Subject | |
---|---|---|---|
Next Message | René Romero Benavides | 2013-02-18 21:01:28 | Re: Streaming replication and sharding |
Previous Message | Seref Arikan | 2013-02-18 19:59:28 | Re: What happens if I create new threads from within a postgresql function? |