From: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> |
---|---|
To: | Simon Riggs <simon(at)2ndQuadrant(dot)com>, Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Unarchived WALs deleted after crash |
Date: | 2013-02-15 16:02:32 |
Message-ID: | 511E5C18.4020201@vmware.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 15.02.2013 17:12, Simon Riggs wrote:
> On 15 February 2013 14:31, Heikki Linnakangas<hlinnakangas(at)vmware(dot)com> wrote:
>>> - /*
>>> - * Normally we don't delete old XLOG files during recovery to
>>> - * avoid accidentally deleting a file that looks stale due to a
>>> - * bug or hardware issue, but in fact contains important data.
>>> - * During streaming recovery, however, we will eventually fill the
>>> - * disk if we never clean up, so we have to. That's not an issue
>>> - * with file-based archive recovery because in that case we
>>> - * restore one XLOG file at a time, on-demand, and with a
>>> - * different filename that can't be confused with regular XLOG
>>> - * files.
>>> - */
>>> - if (WalRcvInProgress() || XLogArchiveCheckDone(xlde->d_name))
>>> + if (RecoveryInProgress() || XLogArchiveCheckDone(xlde->d_name))
>>> [ delete the file ]
>>
>> With that commit, we started to keep WAL segments restored from the archive
>> in pg_xlog, so we needed to start deleting old segments during archive
>> recovery, even when streaming replication was not active. But the above
>> change was to broad; we started to delete old segments also during crash
>> recovery.
>>
>> The above should check InArchiveRecovery, ie. only delete old files when in
>> archive recovery, not when in crash recovery. But there's one little
>> complication: InArchiveRecovery is currently only valid in the startup
>> process, so we'll need to also share it in shared memory, so that the
>> checkpointer process can access it.
>>
>> I propose the attached patch to fix it.
>
> Agree with your diagnosis and fix.
Ok, committed. For the sake of the archives, attached is a script based
on Jehan-Guillaume's description that I used for testing (incidentally
based on Kyotaro's script to reproduce an unrelated problem in another
thread).
Thanks for the report!
- Heikki
Attachment | Content-Type | Size |
---|---|---|
unarchived-wal-removed.sh | application/x-sh | 383 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Fujii Masao | 2013-02-15 16:10:38 | Re: Unarchived WALs deleted after crash |
Previous Message | Alvaro Herrera | 2013-02-15 15:38:58 | Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system |