From: | Simon Riggs <simon(at)2ndQuadrant(dot)com> |
---|---|
To: | Jeff Davis <pgsql(at)j-davis(dot)com> |
Cc: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Sync Rep: First Thoughts on Code |
Date: | 2008-12-02 19:21:18 |
Message-ID: | 1228245678.20796.410.camel@hp_dx2400_1 |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, 2008-12-02 at 11:08 -0800, Jeff Davis wrote:
> On Tue, 2008-12-02 at 13:09 +0000, Simon Riggs wrote:
> > > Is it dangerous to abort the transaction with replication continued when
> > > the timeout occurs? I think that the WAL consistency between two servers
> > > might be broken. Because the WAL writing and sending are done concurrently,
> > > and the backend might already write the WAL to disk on the primary when
> > > waiting for walsender.
> >
> > The issue I see is that we might want to keep wal_sender_delay small so
> > that transaction times are not increased. But we also want
> > wal_sender_delay high so that replication never breaks. It seems better
> > to have the action on wal_sender_delay configurable if we have an
> > unsteady network (like the internet). Marcus made some comments on line
> > dropping that seem relevant here; we should listen to his experience.
> >
> > Hmmm, dangerous? Well assuming we're linking commits with replication
> > sends then it sounds it. We might end up committing to disk and then
> > deciding to abort instead. But remember we don't remove the xid from
> > procarray or mark the result in clog until the flush is over, so it is
> > possible. But I think we should discuss this in more detail when the
> > main patch is committed.
> >
>
> What is the "it" in "it is possible"? It seems like there's still a
> problem window in there.
Marking a transaction aborted after we have written a commit record, but
before we have removed it from proc array and marked in clog. We'd need
a special kind of WAL record to do that.
> Even if that could be made safe, in the event of a real network failure,
> you'd just wait the full timeout every transaction, because it still
> thinks it's replicating.
True, but I did suggest having two timeouts.
There is considerable reason to reduce the timeout as well as reason to
increase it - at the same time.
Anyway, lets wait for some user experience following commit.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
From | Date | Subject | |
---|---|---|---|
Next Message | Heikki Linnakangas | 2008-12-02 20:13:02 | Re: pg_stop_backup wait bug fix |
Previous Message | Simon Riggs | 2008-12-02 19:15:19 | Re: PiTR and other architectures.... |