From: | Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz> |
---|---|
To: | Sébastien Lardière <slardiere(at)hi-media(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org, skytools-users(at)pgfoundry(dot)org |
Subject: | Re: [Skytools-users] WAL Shipping + checkpoint |
Date: | 2009-08-26 22:18:01 |
Message-ID: | 4A95B499.6030803@catalyst.net.nz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Sébastien Lardière wrote:
> On 26/08/2009 04:46, Mark Kirkwood wrote:
>> Sébastien Lardière wrote:
>>> Hi All,
>>>
>>> I've a cluster ( Pg 8.3.7 ) with WAL Shipping, and a few hours ago,
>>> the master had to restart.
>>>
>>> I use walmgr from Skytools, which works very well.
>>>
>>> I have already restart the master without any problem, but today,
>>> the slave doesn't work like I want. The field "Time of latest
>>> checkpoint" from the pg_controldata on the slave keep the same
>>> values, but WAL File are processed correctly.
>>>
>>> I try to restart the slave, but, after processed again all the WAL
>>> between "Time of latest checkpoint" and, it does nothing else,
>>> latest checkpoint stay at the same value.
>>>
>>> I don't know if it's important ( i think so ), and I can't fix it.
>>>
>> It is normal for it to lag behind somewhat on the slave (depending on
>> what your checkpoint timeout etc settings are).
>>
>> However, I've noticed what you are seeing as well - particularly when
>> there are no actual data changes coming through in the logs - the
>> slave checkpoint time does not change even tho there have been
>> checkpoints on the master (I may have a look in the code to see what
>> the story really is...if I have time).
>>
>
> Yes, but the delay between the last checkpoint on the master and the
> slave is very high, now ( 100 000 sec ), because the last checkpoint
> on the slave was yesterday ( as far as pg_controldata is right )
>
> Here a graph from our munin plugin :
> http://seb.ouvaton.org/tmp/bdd-pg_walmgr-week.png
>
> The blue line represent an average between two WAL processed on the
> slave, and the green line, the delai between last checkpoint on the
> master and the slave.
>
> Maybe it's not some good indicator, but the green line let me think
> there is problem.
>
>
Do you have archive_timeout set? If so, then what *could* be happening
is this:
There are actually no "real" data changes being made on your master for
some reason. So every time archive_timeout is reached a log full of no
changes is shipped to your slave and applied - and no checkpoint times
are changed for reasons I mentioned above.
A way to test the would be to do something that makes real data changes
in the master. A good thing to try would be to:
- create a new database
- create tables and add some reasonable amount of data (e.g. initialized
pgbench scale 100).
Then see if your checkpoint time gets updated a few minutes or so later.
From | Date | Subject | |
---|---|---|---|
Next Message | Sergey Samokhin | 2009-08-27 00:36:36 | Re: It looks like transaction, but it isn't transaction |
Previous Message | geoff | 2009-08-26 21:43:12 | Re: PG 8.2 instal on Win2k3 - unable to connect to test network socket |