From: | Jeff Davis <pgsql(at)j-davis(dot)com> |
---|---|
To: | Simon Riggs <simon(at)2ndQuadrant(dot)com> |
Cc: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Sync Rep: First Thoughts on Code |
Date: | 2008-12-02 19:08:06 |
Message-ID: | 1228244886.14591.45.camel@dell.linuxdev.us.dell.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, 2008-12-02 at 13:09 +0000, Simon Riggs wrote:
> > Is it dangerous to abort the transaction with replication continued when
> > the timeout occurs? I think that the WAL consistency between two servers
> > might be broken. Because the WAL writing and sending are done concurrently,
> > and the backend might already write the WAL to disk on the primary when
> > waiting for walsender.
>
> The issue I see is that we might want to keep wal_sender_delay small so
> that transaction times are not increased. But we also want
> wal_sender_delay high so that replication never breaks. It seems better
> to have the action on wal_sender_delay configurable if we have an
> unsteady network (like the internet). Marcus made some comments on line
> dropping that seem relevant here; we should listen to his experience.
>
> Hmmm, dangerous? Well assuming we're linking commits with replication
> sends then it sounds it. We might end up committing to disk and then
> deciding to abort instead. But remember we don't remove the xid from
> procarray or mark the result in clog until the flush is over, so it is
> possible. But I think we should discuss this in more detail when the
> main patch is committed.
>
What is the "it" in "it is possible"? It seems like there's still a
problem window in there.
Even if that could be made safe, in the event of a real network failure,
you'd just wait the full timeout every transaction, because it still
thinks it's replicating.
If the timeout is exceeded, it seems more reasonable to abandon the
slave until you could re-sync it and continue processing as normal. As
you pointed out, that's not necessarily an expensive operation because
you can use something like rsync. The process of re-syncing might be
made easier (or perhaps less costly), of course.
If we want to still allow processing to happen after a timeout, it seems
reasonable to have a configurable option to allow/disallow non-read-only
transactions when out of sync.
Regards,
Jeff Davis
From | Date | Subject | |
---|---|---|---|
Next Message | Simon Riggs | 2008-12-02 19:15:19 | Re: PiTR and other architectures.... |
Previous Message | Heikki Linnakangas | 2008-12-02 18:47:19 | Re: cvs head initdb hangs on unixware |