Re: streaming replication does not work across datacenter with 20ms latency?

From: Yan Chunlu <springrider(at)gmail(dot)com>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: Scott Ribe <scott_ribe(at)elevated-dev(dot)com>, pgsql-general General <pgsql-general(at)postgresql(dot)org>
Subject: Re: streaming replication does not work across datacenter with 20ms latency?
Date: 2011-07-24 16:20:15
Message-ID: CAOA66tEH363sKu1APV9QCXnxntkLBVwtnxZ3xowGtgDZtDrvmw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I did the SR procedure again, still no luck:

is that normal that after start slave postgresql, the first line of log is:
database system was interrupted; last known up at 2011-07-24 10:53:38 CDT??

4760 2011-07-24 10:55:58 CDT 2011-07-24 10:55:58 CDT @ LOG: database
system was interrupted; last known up at 2011-07-24 10:53:38 CDT
4760 2011-07-24 10:55:58 CDT 2011-07-24 10:55:58 CDT @ LOG: entering
standby mode
4762 2011-07-24 10:55:59 CDT 2011-07-24 10:55:59 CDT postgres(at)postgres
[local]FATAL: the database system is starting up
4761 2011-07-24 10:55:59 CDT 2011-07-24 10:55:59 CDT @ LOG: streaming
replication successfully connected to primary
4764 2011-07-24 10:55:59 CDT 2011-07-24 10:55:59 CDT postgres(at)postgres
10.28.53.11(53442)FATAL: the database system is starting up
4770 2011-07-24 10:56:00 CDT 2011-07-24 10:56:00 CDT postgres(at)postgres
[local]FATAL: the database system is starting up
4802 2011-07-24 10:56:01 CDT 2011-07-24 10:56:01 CDT postgres(at)postgres
[local]FATAL: the database system is starting up
4760 2011-07-24 10:56:01 CDT 2011-07-24 10:56:01 CDT @ LOG: redo
starts at 57/6B002028
4760 2011-07-24 10:56:01 CDT 2011-07-24 10:56:01 CDT @ LOG: invalid
record length at 57/6B20E010
4761 2011-07-24 10:56:01 CDT 2011-07-24 10:56:01 CDT @ FATAL:
terminating walreceiver process due to administrator command
4760 2011-07-24 10:56:01 CDT 2011-07-24 10:56:01 CDT @ LOG: invalid
magic number 0000 in log file 87, segment 107, offset 2490368
4847 2011-07-24 10:56:02 CDT 2011-07-24 10:56:02 CDT postgres(at)postgres
[local]FATAL: the database system is starting up
4850 2011-07-24 10:56:02 CDT 2011-07-24 10:56:02 CDT postgres(at)postgres
10.28.53.11(53443)FATAL: the database system is starting up
4851 2011-07-24 10:56:03 CDT 2011-07-24 10:56:03 CDT postgres(at)postgres
[local]FATAL: the database system is starting up
4860 2011-07-24 10:56:04 CDT 2011-07-24 10:56:04 CDT postgres(at)postgres
[local]FATAL: the database system is starting up
4865 2011-07-24 10:56:05 CDT 2011-07-24 10:56:05 CDT postgres(at)postgres
[local]FATAL: the database system is starting up
4859 2011-07-24 10:56:05 CDT 2011-07-24 10:56:05 CDT @ LOG: streaming
replication successfully connected to primary
4874 2011-07-24 10:56:06 CDT 2011-07-24 10:56:06 CDT postgres(at)postgres
[local]FATAL: the database system is starting up
4869 2011-07-24 10:56:06 CDT 2011-07-24 10:56:06 CDT
postgres(at)template1 10.28.53.11(53444)FATAL: the database system is
starting up
4879 2011-07-24 10:56:07 CDT 2011-07-24 10:56:07 CDT postgres(at)postgres
[local]FATAL: the database system is starting up
4760 2011-07-24 10:56:07 CDT 2011-07-24 10:56:07 CDT @ LOG: invalid
record length at 57/6B2BA010
4859 2011-07-24 10:56:07 CDT 2011-07-24 10:56:07 CDT @ FATAL:
terminating walreceiver process due to administrator command
4760 2011-07-24 10:56:07 CDT 2011-07-24 10:56:07 CDT @ LOG: invalid
magic number 0000 in log file 87, segment 107, offset 2883584
4887 2011-07-24 10:56:08 CDT 2011-07-24 10:56:08 CDT postgres(at)postgres
[local]FATAL: the database system is starting up
4888 2011-07-24 10:56:08 CDT 2011-07-24 10:56:08 CDT @ LOG: streaming
replication successfully connected to primary
4892 2011-07-24 10:56:09 CDT 2011-07-24 10:56:09 CDT postgres(at)postgres
[local]FATAL: the database system is starting up
4896 2011-07-24 10:56:09 CDT 2011-07-24 10:56:09 CDT
postgres(at)template1 10.28.53.11(53445)FATAL: the database system is
starting up
4901 2011-07-24 10:56:10 CDT 2011-07-24 10:56:10 CDT postgres(at)postgres
[local]FATAL: the database system is starting up
4906 2011-07-24 10:56:11 CDT 2011-07-24 10:56:11 CDT postgres(at)postgres
[local]FATAL: the database system is starting up
4760 2011-07-24 10:56:11 CDT 2011-07-24 10:56:11 CDT @ LOG: invalid
record length at 57/6B486010
4888 2011-07-24 10:56:11 CDT 2011-07-24 10:56:11 CDT @ FATAL:
terminating walreceiver process due to administrator command
4760 2011-07-24 10:56:11 CDT 2011-07-24 10:56:11 CDT @ LOG: invalid
magic number 0000 in log file 87, segment 107, offset 4849664

On Sun, Jul 24, 2011 at 8:46 PM, Yan Chunlu <springrider(at)gmail(dot)com> wrote:
> checkpoint_segments = 64
> wal_keep_segments = 128
>
> On Sun, Jul 24, 2011 at 8:25 PM, Tomas Vondra <tv(at)fuzzy(dot)cz> wrote:
>> On 24 Červenec 2011, 6:09, Yan Chunlu wrote:
>>> thanks for all the help!
>>>
>>> @Adrian:  yes, only one instance on each machine
>>>
>>> not the slave finally started and could be connect, replication didn't
>>> begin, just following errors:
>>> https://gist.github.com/1102225
>>
>> These errors just mean the master already removed WAL segments, so the
>> slave can't actually start the replication because there'd be a gap. This
>> usually happens with enough write activity (inserts, updates) when the
>> slave is being setup.
>>
>> Whaht is your wal_keep_segments value? Increase it or set up WAL
>> archiving, so that the slave can get the data.
>>
>> Tomas
>>
>>
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2011-07-24 18:45:49 Re: weird table sizes
Previous Message Chris Travers 2011-07-24 15:58:44 Re: Implementing "thick"/"fat" databases