From: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
---|---|
To: | Dimitri Fontaine <dfontaine(at)hi-media(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Failback with log shipping |
Date: | 2010-05-28 20:49:13 |
Message-ID: | 4C002C49.4070501@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 28/05/10 22:20, Dimitri Fontaine wrote:
> Heikki Linnakangas<heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>> Not shipped before the first failover you mean? No, if any WAL records were
>> created in the old master that were not shipped to the standby before
>> failover, the corresponding changes to the data files might've been flushed
>> to disk already, and you can't undo those by not replaying the WAL record on
>> restart.
>
> Ah yes you need to fail between when (WAL is written and not sent) and
> CHECKPOINT for this to be possible.
Checkpoint only guarantees that everything before that is flushed to
disk. It doesn't guarantee that nothing is flushed to disk until that.
If there's a checkpoint that hasn't been shipped to the standby, you're
certainly hosed, but if there is no checkpoint you don't know if the
data files have changed or not.
> But automatic testing of the
> situation (is the data already safe in PGDATA) might still be possible?
Hmm, so the situation is this:
D - E - crash!
/
A - B - C
\
d - f - g - h
The letters represent WAL records. C is the last WAL record that was
shipped to the standby, D & E are WAL records that were generated in the
old master before the crash but never sent to the standby, and d-h are
WAL records created in the standby after failover.
I guess you could read the WAL in the old master and compare it with the
WAL from the standby to figure out where the failover happened (C), and
then scan all the data pages involved in records D - E, checking that
the LSNs on the data pages touched by those records are earlier than C.
That's a bit laborious, and requires knowledge of all different kinds of
WAL records to figure out which data pages they touch, but seems
possible in theory.
>>> How easy is it to script that? It seems all we need is the current XID
>>> of the slave at the end of recovery. It should be in the log, maybe it's
>>> easy enough to expose it at the SQL level…
>>
>> XID doesn't help at all, LSN more likely, but I feel that I don't fully
>> understand what you're saying.
>
> Sorry I was unclear, I was thinking in terms of recovery.conf file and
> either recovery_target_xid or recovery_target_time. The idea being that
> if the old-master didn't CHECKPOINT the changes that the slave missed,
> then we can do crash recovery and choose to stop before that point, then
> apply WALs from the new master.
Ah, I see. No, you don't want to use a recovery target, that would end
the recovery and start the server. You just need to make sure to use
WALs from the new master instead of the old one when both exist.
> So you're saying controlled failover could possibly skip base backup to
> reuse old master as new slave, and I'm asking if by some luck (crash
> happened before CHECKPOINT) and some recovery.conf setup we could get to
> the same situation in case of hard failure. That would allow completely
> automatic switchover / failover with no need to resync.
Yeah, that would be nice. In practice, I think you would get lucky more
often than not, because whenever you modify and dirty a page, writing a
WAL record, the usage count on the buffer is incremented and it won't be
evicted from the buffer cache for a while.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Heikki Linnakangas | 2010-05-28 21:18:54 | Re: How to pass around collation information |
Previous Message | Josh Berkus | 2010-05-28 20:44:47 | Re: List traffic |