Re: Replication Node Identifiers and crashsafe Apply Progress

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Steve Singer <steve(at)ssinger(dot)info>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Replication Node Identifiers and crashsafe Apply Progress
Date: 2013-11-21 11:04:55
Message-ID: 20131121110455.GH7240@alap2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2013-11-19 18:49:27 -0500, Steve Singer wrote:
> >But for that the receiving side needs to know up to where changes have
> >been applied. One relatively easy solution for that is that the
> >receiving side does something like:
> >UPDATE replication_progress SET lsn = '0/10000600' WHERE source_id = ...;
> >before the end of every replayed transaction. But that obviously will
> >quickly cause bloat.
>
> I don't see how this is going to cause any more bloat than what
> trigger-based slony does today with sl_confirm and I don't hear a lot of
> complaints about that being a big problem.

FWIW, bloat on slony's tables (including sl_confirm) is one of the major
reasons I've seen people move away from slony for production, and use it
only for upgrades.
It's only really a problem if you have longrunning transactions on the
standby, but that's a pretty major use-case of having replicas.

> This might be because slony doesn't do a commit on the replica for
> every transaction but groups the transactions together, logical slony
> will behave the same way where we would only commit on SYNC
> transactions.

But yes, the grouping of transactions certainly makes for a major
difference. I don't think we want to force solutions to commit
transactions in batches. Not the least because that obviously prohibits
using a standby as a synchronous replica.

> >* Do we want to allow setting (remote_lsn, remote_timestamp,
> > remote_node_id) via SQL? Currently the remote_node_id can be set as a
> > GUC, but the other's can't. They probably should be a function that
> > can be called instead of GUCs?
>
> A way of advancing the replication pointer via SQL would be nice, otherwise
> I'll just have to write my own C function that I will invoke via SQL (which
> sin't hard but everyone would need to do the same)

But don't you already essentially perform the actual inserts via C in
new slonys? That's mainly the reason I wasn't sure it's needed.

But then, providing a function to do that setup isn't hard.

> What does building up node_id key from (sysid, tlid, remote_dbid,
> local_dbid, name) get us over just mapping from an arbitrary name
> field to a 16 bit node_id ?

It avoids the need to manually assign ids to systems in many cases. I've
seen people complain about that a fair bit.

But it seems pretty clear that a more arbitrary identifier is preferred
so far, so I'll go for that.

Thanks for the comments,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2013-11-21 11:12:27 Re: Replication Node Identifiers and crashsafe Apply Progress
Previous Message David Rowley 2013-11-21 10:52:37 b21de4e7b32f868a23bdc5507898d36cbe146164 seems to be two bricks shy of a load