Quick Links

Re: Replication on the backend

From:	"J(dot) Andrew Rogers" <jrogers(at)neopolitan(dot)com>
To:	Markus Schiltknecht <markus(at)bluegap(dot)ch>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Replication on the backend
Date:	2005-12-07 09:04:24
Message-ID:	3FA96BF2-6A09-48FD-9695-381AC1513A10@neopolitan.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Dec 6, 2005, at 11:42 PM, Markus Schiltknecht wrote:
> Does anybody have latency / roundtrip measurements for current
> hardware?
> I'm interested in:
> 1Gb Ethernet,
> 10 Gb Ethernet,
> InfiniBand,
> probably even p2p usb2 or firewire links?

In another secret life, I know a bit about supercomputing fabrics.
The latency metrics have to be thoroughly qualified.

First, most of the RTT latency numbers for network fabrics are for 0
byte packet sizes, which really does not apply to anyone shuffling
real data around. For small packets, high-performance fabrics (HTX
Infiniband, Quadrics, etc) have approximately an order of magnitude
less latency than vanilla Ethernet, though the performance specifics
depend greatly on the actual usage. For large packet sizes, the
differences in latency become far less obvious. However, for "real"
packets a performant fabric will still look very good compared to
disk systems. Switched fiber fabrics have enough relatively
inexpensive throughput now to saturate most disk systems and CPU I/O
busses; only platforms like HyperTransport can really keep up. It is
worth pointing out that the latency of high-end network fabrics is
similar to large NUMA fabrics, which exposes some of the limits of
SMP scalability. As a point of reference, an organization that knows
what they are doing should have no problem getting 500 microsecond
RTT on a vanilla metropolitan area GigE fiber network -- a few
network operators actually do deliver this on a regional scale. For
a local cluster, a competent design can best this by orders of
magnitude.

There are a number of silicon limitations, but a system that connects
the fabric directly to HyperTransport can drive several GB/s with
very respectable microsecond latencies if the rest of the system is
up to it. There are Opteron system boards now that will drive
Infiniband directly from HyperTransport. I know Arima/Rioworks makes
some (great server boards generally), and several other companies are
either making them or have announced them in the pipeline. These
Opteron boards get pretty damn close to Big Iron SMP fabric
performance in a cheap package. Given how many companies have
announced plans to produce Opteron server boards with Infiniband
fabrics directly integrated into HyperTransport, I would say that
this is the future of server boards.

And if postgres could actually use an infiniband fabric for
clustering a single database instance across Opteron servers, that
would be very impressive...

J. Andrew Rogers

In response to

Re: Replication on the backend at 2005-12-07 07:42:55 from Markus Schiltknecht

Responses

Re: Replication on the backend at 2005-12-07 09:23:50 from Markus Schiltknecht
Re: Replication on the backend at 2005-12-07 10:46:41 from Luke Lonergan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Harald Fuchs	2005-12-07 09:23:13	Re: Oddity with extract microseconds?
Previous Message	Hannu Krosing	2005-12-07 07:43:08	Re: Concurrent CREATE INDEX, try 2 (was Re: Reducing