Re: Horizontal scalability/sharding

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Horizontal scalability/sharding
Date: 2015-09-03 02:26:11
Message-ID: CAB7nPqTNeTB0wNHha7awfvQb8Rj78pkd03ti0nBOrktZBH_vUA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 3, 2015 at 3:41 AM, Robert Haas wrote:
> 3. IIUC, Postgres-XC handles this problem by reducing at least
> volatile functions, maybe all functions, to constants. Then it
> generates an SQL statement to be sent to the data node to make the
> appropriate change. If there's more than one copy of the data, we
> send a separate copy of the SQL statement to every node. I'm not sure
> exactly what happens if some of those nodes are not available, but I
> don't think it's anything good. Fundamentally, this model doesn't
> allow for many good options in that case.

I don't recall that. Immutable functions are switched to constants in
the query sent to datanodes. Volatile and stable functions are
evaluated locally after fetching the results from the remote node. Not
that efficient for warehouse loads. My 2c.

> 4. Therefore, I think that we should instead use logical replication,
> which might be either synchronous or asynchronous. When you modify
> one copy of the data, that change will then be replicated to all other
> nodes. If you are OK with eventual consistency, this replication can
> be asynchronous, and nodes that are off-line will catch up when they
> are on-line. If you are not OK with that, then you must replicate
> synchronously to every node before transaction commit; or at least you
> must replicate synchronously to every node that is currently on-line.
> This presents some challenges: logical decoding currently can't
> replicate transactions that are still in process - replication starts
> when the transaction commits. Also, we don't have any way for
> synchronous replication to wait for multiple nodes.

That's something that the quorum synchronous patch would address.
Still, having the possibility to be synchronous across multiple nodes
does not seem like to be something at the top of the list.

> Also, the GTM needs to be aware that this stuff is happening, or it will DTWT. That too seems like a problem that can be solved.

If I understood correctly, yes it is with its centralized transaction
facility each node is aware of the transaction status via the global
snapshot.
--
Michael

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2015-09-03 02:40:31 Re: Allow replication roles to use file access functions
Previous Message Stephen Frost 2015-09-03 02:20:36 Re: Allow replication roles to use file access functions