Re: The plan for FDW-based sharding

From: Kevin Grittner <kgrittn(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: The plan for FDW-based sharding
Date: 2016-02-27 22:38:24
Message-ID: CACjxUsPHdNsFH6SLJ7KvSkZatD_6egn0mbmwUziUoZfd-MaWdg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Feb 27, 2016 at 3:57 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On 27 February 2016 at 17:54, Kevin Grittner <kgrittn(at)gmail(dot)com> wrote:
>>
>> On a single database SSI can see whether a read has
>> caused such a problem. If you replicate the transactions to
>> somewhere else and read them SSI cannot tell whether there is an
>> anomaly
>
> OK, I thought you were saying something else. What you're saying is that SSI
> doesn't work on replicas, yet, whether that is physical or logical.

Right.

> Row level locking (S2PL) can be used on logical standbys, so its actually a
> better situation.

Except that S2PL has the concurrency and performance problems that
caused us to rip out a working S2PL implementation in PostgreSQL
core. Layering it on outside of that isn't going to offer better
concurrency or perform better than what we ripped out; but it does
work.

>> One possibility is to pass along information
>> about when things are in a state on the source that is known to be
>> free of anomalies if read; another would be to reorder the
>> application of transactions to match the apparent order of
>> execution. The latter would not work for "physical" replication,
>> but should be fine for logical replication. An implementation
>> might create a list in commit order, but not release the front of
>> the list for processing if it is a SERIALIZABLE transaction which
>> has written data until all overlapping SERIALIZABLE transactions
>> complete, so it can move any subsequently-committed SERIALIZABLE
>> transaction which read the "old" version of the data ahead of it.
>
> The best way would be to pass across "anomaly barriers", since they can
> easily be inserted into the WAL stream. The main issue seems to be how and
> when to detect them.

That, and how to choose whether to run right away with the last
known consistent snapshot, or wait for the next one. There seem to
be use cases for both. None of it seems extraordinarily hard; it's
just never been anyone's top priority. :-/

> For logical replay, applying in batches is actually a good thing since it
> allows parallelism. We can remove them all from the target's procarray all
> at once to avoid intermediate states becoming visible. So that would be the
> preferred mechanism.

That could be part of a solution. What I sketched out with the
"apparent order of execution" ordering of the transactions
(basically, commit order except when one SERIALIZABLE transaction
needs to be dragged in front of another due to a read-write
dependency) is possibly the simplest approach, but batching may
well give better performance.

> Collecting a list of transactions that must be applied before the current
> one could be accumulated during SSI processing and added to the commit
> record. But reordering the transaction apply is something we'd need to get
> some real clear theory on before we considered it.

Oh, there is a lot of very clear theory on it. I even considered
whether it might work at the physical level, but that seems fraught
with potential land-mines due to the subtle ways in which we manage
race conditions at the detail level. It's one of those things that
seems theoretically possible, but probably a really bad idea in
practice. For logical replication, though, there is a clear way to
determine a reasonable order of applying changes that will never
yield a serialization anomaly -- if we do that, we dodge the choice
between using a "stale" safe snapshot or waiting an indeterminate
length of time for a "fresh" safe snapshot -- at the cost of
delaying logical replication itself at various points.

Anyway, we seem to be on the same page; just some minor
miscommunication at some point. I apologize if I was unclear.

Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thom Brown 2016-02-27 22:54:36 Re: Proposal: "Causal reads" mode for load balancing reads without stale data
Previous Message Simon Riggs 2016-02-27 21:57:31 Re: The plan for FDW-based sharding