Re: Horizontal scalability/sharding

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Mason S <masonlists(at)gmail(dot)com>, Oleg Bartunov <obartunov(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Horizontal scalability/sharding
Date: 2015-09-01 17:17:36
Message-ID: CA+TgmoaH8hLOJB4tMjBmBst+tswVe3kUBmaJhiJB3cLQ1ksRpg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 1, 2015 at 1:06 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> You're assuming that our primary bottleneck for writes is IO. It's not
> at present for most users, and it certainly won't be in the future. You
> need to move your thinking on systems resources into the 21st century,
> instead of solving the resource problems from 15 years ago.

Your experience doesn't match mine. I find that it's frequently
impossible to get the system to use all of the available CPU capacity,
either because you're bottlenecked on locks or because you are
bottlenecked on the I/O subsystem, and with the locking improvements
in newer versions, the former is becoming less and less common.
Amit's recent work on scalability demonstrates this trend: he goes
looking for lock bottlenecks, and finds problems that only occur at
128+ concurrent connections running full tilt. The patches show
limited benefit - a few percentage points - at lesser concurrency
levels. Either there are other locking bottlenecks that limit
performance at lower client counts but which mysteriously disappear as
concurrency increases, which I would find surprising, or the limit is
somewhere else. I haven't seen any convincing evidence that the I/O
subsystem is the bottleneck, but I'm having a hard time figuring out
what else it could be.

> Our real future bottlenecks are:
>
> * ability to handle more than a few hundred connections
> * locking limits on the scalability of writes
> * ability to manage large RAM and data caches

I do agree that all of those things are problems, FWIW.

> Any sharding solution worth bothering with will solve some or all of the
> above by extending our ability to process requests across multiple
> nodes. Any solution which does not is merely an academic curiosity.

I think the right solution to those problems is to attack them
head-on. Sharding solutions should cater to use cases where using all
the resources of one machine isn't sufficient no matter how
efficiently we do it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joshua D. Drake 2015-09-01 17:17:38 Re: Horizontal scalability/sharding
Previous Message Robert Haas 2015-09-01 17:09:14 Re: perlcritic