Quick Links

Re: Logical replication and multimaster

From:	konstantin knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To:	Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Logical replication and multimaster
Date:	2015-12-03 06:54:23
Message-ID:	2C8345A7-1B2B-4AFD-899F-A4C8C71CF28C@postgrespro.ru
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Dec 3, 2015, at 4:09 AM, Craig Ringer wrote:

> On 1 December 2015 at 00:20, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
>
> > We have implemented ACID multimaster based on logical replication and our DTM (distributed transaction manager) plugin.
>
> What are you using for an output plugin and for replay?

I have implemented output plugin for multimaster based on Michael's decoder_raw+receiver_raw.
Right now it decodes WAL into correspondent SQL insert/update statements.
Certainly it is very inefficient way and in future I will replace it with some binary protocol, as it is used for example in BDR
(but BDR plugin contains a lot of stuff related with detecting and handling conflicts which is not relevant for multimaster).
But right now performance of Multimaster is not limited by logical replication protocol - if I remove DTM and use asynchronous replication (lightweight version of BDR:)
then I get 38k TPS instead of 12k.

>
> I'd really like to collaborate using pglogical_output if at all possible. Petr's working really hard to get the pglogical downstrem out too, with me helping where I can.
>
> I'd hate to be wasting time and effort working in parallel on overlapping functionality. I did a LOT of work to make pglogical_output extensible and reusable for different needs, with hooks used heavily instead of making things specific to the pglogical downstream. A protocol documented in detail. A json output mode as an option. Parameters for clients to negotiate options. etc.
>
> Would a different name for the upstream output plugin help?

And where I can get pglogical_output plugin? Sorry, but I can't quickly find reference with Google...
Also I wonder if this plugin perform DDL replication (most likely not). But then naive question - why DDL was excluded from logical replication protocol?
Are there some principle problems with it? In BDR it was handled in alternative way, using executor callback. It will be much easier if DDL can be replicated in the same way as normal SQL statements.

>
> And according to 2ndquadrant results, BDR performance is very close to hot standby.
>
> Yes... but it's asynchronous multi-master. Very different to what you're doing.
>
> I wonder if it is principle limitation of logical replication approach which is efficient only for asynchronous replication or it can be somehow tuned/extended to efficiently support synchronous replication?
>
> I'm certain there are improvements to be made for synchronous replication.
>
> We have also considered alternative approaches:
> 1. Statement based replication.
>
> Just don't go there. Really.
>
> It seems to be better to have one connection between nodes, but provide parallel execution of received transactions at destination side.
>
> I agree. This is something I'd like to be able to do through logical decoding. As far as I can tell there's no fundamental barrier to doing so, though there are a few limitations when streaming logical xacts:
>
> - We can't avoid sending transactions that get rolled back
>
> - We can't send the commit timestamp, commit LSN, etc at BEGIN time, so last-update-wins
> conflict resolution can't be done based on commit timestamp
>
> - When streaming, the xid must be in each message, not just in begin/commit.
>
> - The apply process can't use the SPI to apply changes directly since we can't multiplex transactions. It'll need to use
> shmem to communicate with a pool of workers, dispatching messages to workers as they arrive. Or it can multiplex
> a set of libpq connections in async mode, which I suspect may prove to be better.
>
> I've made provision for streaming support in the pglogical_output extension. It'll need core changes to allow logical decoding to stream changes though.
>
> Separately, I'd also like to look at decoding and sending sequence advances, which are something that happens outside transaction boundaries.
>
>
> We have now in PostgreSQL some infrastructure for background works, but there is still no abstraction of workers pool and job queue which can provide simple way to organize parallel execution of some jobs. I wonder if somebody is working now on it or we should try to propose our solution?
>
> I think a worker pool would be quite useful to have.
>
> For BDR and for pglogical we had to build an infrastructure on top of static and dynamic bgworkers. A static worker launches a dynamic bgworker for each database. The dynamic bgworker for the database looks at extension-provided user catalogs to determine whether it should launch more dynamic bgworkers for each connection to a peer node.
>
> Because the bgworker argument is a single by-value Datum the argument passed is an index into a static shmem array of structs. The struct is populated with the target database oid (or name, for 9.4, due to bgworker API limitations) and other info needed to start the worker.
>
> Because registered static and dynamic bgworkers get restarted by the postmaster after a crash/restart cycle, and the restarted static worker will register new dynamic workers after restart, we have to jump through some annoying hoops to avoid duplicate bgworkers. A generation counter is stored in postmaster memory and incremented on crash recovery then copied to shmem. The high bits of the Datum argument to the workers embeds the generation counter. They compare their argument's counter to the one in shmem and exit if the counter differs, so the relaunched old generation of workers exits after a crash/restart cycle. See the thread on BGW_NO_RESTART_ON_CRASH for details.
>
> In pglogical we're instead using BGW_NEVER_RESTART workers and doing restarts ourselves when needed, ignoring the postmaster's ability to restart bgworkers when the worker crashes.
>
> It's likely that most projects using bgworkers for this sort of thing will need similar functionality, so generalizing it into a worker pool API makes a lot of sense. In the process we could really use API to examine currently registered and running bgworkers. Interested in collaborating on that?
>
> Another thing I've wanted as part of this work is a way to get a one-time authentication cookie from the server that can be passed as a libpq connection option to get a connection without having to know a password or otherwise mess with pg_hba.conf. Basically a way to say "I'm a bgworker running with superuser rights within Pg, and I want to make a libpq connection to this database. I'm inherently trusted, so don't mess with pg_hba.conf and passwords, just let me in".
>
> --
> Craig Ringer http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services

In response to

Re: Logical replication and multimaster at 2015-12-03 01:09:46 from Craig Ringer

Responses

Re: Logical replication and multimaster at 2015-12-03 07:34:59 from Craig Ringer
Re: Logical replication and multimaster at 2015-12-09 19:28:24 from Andres Freund

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Pavel Stehule	2015-12-03 06:58:59	Re: proposal: PL/Pythonu - function ereport
Previous Message	Michael Paquier	2015-12-03 06:48:50	Re: psql: add \pset true/false