From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov> |
Cc: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | Re: warning message in standby |
Date: | 2010-06-29 14:34:40 |
Message-ID: | AANLkTinsQxwHEBmSr7dxXD6JepznOrRMF9zHoBhOC8rk@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Jun 29, 2010 at 10:21 AM, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
>> ...with this patch, following the above, you get:
>>
>> FATAL: invalid record in WAL stream
>> HINT: Take a new base backup, or remove recovery.conf and restart
>> in read-write mode.
>> LOG: startup process (PID 6126) exited with exit code 1
>> LOG: terminating any other active server processes
>
> If someone is sloppy about how they copy the WAL files around, they
> could temporarily have a truncated file.
Can you explain the scenario you're concerned about in more detail?
> If we want to be tolerant
> of straight file copies, without a temporary name or location with a
> move on completion, we would need some kind of retry or timeout. It
> appears that you have this hard-coded to five retries. I'm not
> saying this is a bad setting, but I always wonder about hard-coded
> magic numbers like this. What's the delay between retries? How did
> you arrive at five as the magic number?
It's approximately the number Josh Berkus suggested in an email
upthread. In other words, SWAG.
There's not a fixed delay between retries - it represents the number
of times that we have either (a) streamed the relevant chunk from the
master via WALSender, or (b) retrieved the segment from the archive
with restore_command. The first retry CAN help, if WAL streaming from
master to standby was interrupted unexpectedly. AFAIK, the additional
retries after that are just paranoia, but I can't rule out the
possibility that I'm missing something, in which case we might have to
rethink the whole approach.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2010-06-29 14:40:43 | Re: Look-behind regular expressions |
Previous Message | Robert Haas | 2010-06-29 14:28:33 | Re: keepalives on MacOS X |