From: | Simon Riggs <simon(at)2ndQuadrant(dot)com> |
---|---|
To: | Daniel Farina <daniel(at)heroku(dot)com> |
Cc: | Chris Redekop <chris(at)replicon(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Hot Backup with rsync fails at pg_clog if under load |
Date: | 2011-10-23 22:39:01 |
Message-ID: | CA+U5nM+pTpT6eWrHD57y-X_MqpicPMguJPXJVvcq1nG1Rid80Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, Oct 23, 2011 at 9:48 PM, Daniel Farina <daniel(at)heroku(dot)com> wrote:
> On Mon, Oct 17, 2011 at 11:30 PM, Chris Redekop <chris(at)replicon(dot)com> wrote:
>> Well, on the other hand maybe there is something wrong with the data.
>> Here's the test/steps I just did -
>> 1. I do the pg_basebackup when the master is under load, hot slave now will
>> not start up but warm slave will.
>> 2. I start a warm slave and let it catch up to current
>> 3. On the slave I change 'hot_standby=on' and do a 'service postgresql
>> restart'
>> 4. The postgres fails to restart with the same error.
>> 5. I turn hot_standby back off and postgres starts back up fine as a warm
>> slave
>> 6. I then turn off the load, the slave is all caught up, master and slave
>> are both sitting idle
>> 7. I, again, change 'hot_standby=on' and do a service restart
>> 8. Again it fails, with the same error, even though there is no longer any
>> load.
>> 9. I repeat this warmstart/hotstart cycle a couple more times until to my
>> surprise, instead of failing, it successfully starts up as a hot standby
>> (this is after maybe 5 minutes or so of sitting idle)
>> So...given that it continued to fail even after the load had been turned of,
>> that makes me believe that the data which was copied over was invalid in
>> some way. And when a checkpoint/logrotation/somethingelse occurred when not
>> under load it cleared itself up....I'm shooting in the dark here
>> Anyone have any suggestions/ideas/things to try?
>
> Having digged at this a little -- but not too much -- the problem
> seems to be that postgres is reading the commit logs way, way too
> early, that is to say, before it has played enough WAL to be
> 'consistent' (the WAL between pg_start and pg_stop backup). I have
> not been able to reproduce this problem (I think) after the message
> from postgres suggesting it has reached a consistent state; at that
> time I am able to go into hot-standby mode.
>
> The message is like: "consistent recovery state reached at %X/%X".
> (this is the errmsg)
>
> It doesn't seem meaningful for StartupCLOG (or, indeed, any of the
> hot-standby path functionality) to be called before that code is
> executed, but it is anyway right now. I'm not sure if this oversight
> is simply an oversight, or indicative of a misplaced assumption
> somewhere. Basically, my thoughts for a fix are to suppress
> hot_standby = on (in spirit) before the consistent recovery state is
> reached.
Not sure about that, but I'll look at where this comes from.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Janes | 2011-10-23 22:41:05 | Index only scans and visibilitymap.c |
Previous Message | Tom Lane | 2011-10-23 22:34:47 | Re: termination of backend waiting for sync rep generates a junk log message |