Re: Replication on the backend

From: "J(dot) Andrew Rogers" <jrogers(at)neopolitan(dot)com>
To: Gregory Maxwell <gmaxwell(at)gmail(dot)com>
Cc: Jan Wieck <JanWieck(at)yahoo(dot)com>, Mario Weilguni <mario(dot)weilguni(at)icomedias(dot)com>, Chris Browne <cbbrowne(at)acm(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Replication on the backend
Date: 2005-12-07 09:26:15
Message-ID: DC5354B1-808C-4E1A-9EDA-C7084C4914B1@neopolitan.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On Dec 6, 2005, at 9:09 PM, Gregory Maxwell wrote:
> Eh, why would light limited delay be any slower than a disk on FC the
> same distance away? :)
>
> In any case, performance of PG on iscsi is just fine. You can't blame
> the network... Doing multimaster replication is hard because the
> locking primitives that are fine on a simple multiprocessor system
> (with a VERY high bandwidth very low latency interconnect between
> processors) just don't work across a network, so you're left finding
> other methods and making them work...

Speed of light latency shows up pretty damn often in real networks,
even relatively local ones. The number of people that wonder why a
transcontinental SLA of 10ms is not possible is astonishing. The
silicon fabrics are sufficiently fast that most well-designed
networks are limited by how fast one can push photons through a
fiber, which is significantly slower than photons through a vacuum.
Silicon switch fabrics add latency measured in nanoseconds, which is
effectively zero for many networks that leave the system board.

Compared to single system simple SMP, a local cluster built on a
first-rate fabric will have about an order of magnitude higher
latency but very similar bandwidth. On the other hand, at those
latencies you can increase the number of addressable processors with
that kind of bandwidth by an order of magnitude, so it is a bit of a
trade. However, latency matters a lot such that one would have to be
a lot smarter about partitioning synchronization across that fabric
even though one would lose nothing in the bandwidth department.

> But again, multimaster isn't hard because there of some inherently
> slow property of networks.

Eh? As far as I know, the difficulty of multi-master is almost
entirely a product of the latency of real networks such that they are
too slow for scalable distributed locks. SMP is little more than a
distributed lock manager implemented in silicon. Therefore, multi-
master is hard in practice because we cannot drive networks fast
enough. That said, current state-of-the-art network fabrics are
within an order of magnitude of SMP fabrics such that they could be
real contenders, particularly once you get north of 8-16 processors.

The really sweet potential is in Opteron system boards with
Infiniband directly attached to HyperTransport. At that level of
bandwidth and latency, both per node and per switch fabric, the
architecture possibilities start to become intriguing.

J. Andrew Rogers

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Luke Lonergan 2005-12-07 10:46:41 Re: Replication on the backend
Previous Message Markus Schiltknecht 2005-12-07 09:23:50 Re: Replication on the backend