Minimal streaming replication

From: Steve Crawford <scrawford(at)pinpointresearch(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Minimal streaming replication
Date: 2012-06-25 23:47:10
Message-ID: 4FE8F87E.5020505@pinpointresearch.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I'm attempting to set up minimal/simple replication with one master and
one standby using the following pair of identical machines connected
through through a 1-Gb switch:
3.2.0-25-generic #40-Ubuntu SMP Wed May 23 20:30:51 UTC 2012 x86_64
x86_64 x86_64 GNU/Linux
PostgreSQL 9.1.4 on x86_64-unknown-linux-gnu, compiled by gcc
(Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit

The documentation says "To use streaming replication, set up a
file-based log-shipping standby server as described in Section 25.2...."
however I'm not using any of the archive or restore commands but instead
use pg_basebackup to do the initial copy in a script that at its core
runs pg_basebackup then starts the standby server. So...

Given a sufficiently large wal_keep_segments on the master is this a
reasonable approach?

Is there a disadvantage, other than disk-space required, to having
wal_keep_segments set to a fairly large number, say 256 or 512?

Once replication was running I tried to stress/break it. I started
pgbench with 100 clients and then simultaneously started a restore (12
GB of tables plus associated indexes). It *seems* to work. I get
appropriate results from test queries, and the master and standby
monitoring queries seem reasonable (queries taken at different times -
log locations won't match):

--Standby
select
pg_last_xlog_receive_location(),
pg_last_xlog_replay_location(),
now()-pg_last_xact_replay_timestamp() as log_delay;
pg_last_xlog_receive_location | pg_last_xlog_replay_location |
log_delay
-------------------------------+------------------------------+-----------------
1F2/F4E4F8C0 | 1F2/F4E4F8C0 |
00:00:00.995516

--Master
select * from pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
procpid | 25945
usesysid | 10
usename | postgres
application_name | walreceiver
client_addr | 192.168.4.215
client_hostname |
client_port | 41335
backend_start | 2012-06-25 15:59:02.833441-07
state | streaming
sent_location | 1F3/659F2000
write_location | 1F3/659D3538
flush_location | 1F3/659D3538
replay_location | 1F3/659C1570
sync_priority | 0
sync_state | async

However I'm seeing troubling messages in the log. While running pgbench
I see the following types of messages on the master every minute or few:

2012-06-25 16:15:51 PDT WARNING: pgstat wait timeout
2012-06-25 16:16:26 PDT LOG: SSL renegotiation failure
2012-06-25 16:16:26 PDT LOG: SSL error: unexpected record
2012-06-25 16:16:26 PDT LOG: could not send data to client: Connection
reset by peer

The standby has the following sorts of messages:
...
2012-06-25 11:12:11 PDT FATAL: could not receive data from WAL stream:
SSL error: sslv3 alert unexpected message
2012-06-25 11:12:11 PDT LOG: record with zero length at 1C5/95D2FE00
2012-06-25 11:12:26 PDT LOG: streaming replication successfully
connected to primary
...
2012-06-25 11:30:59 PDT LOG: unexpected pageaddr 1C7/C9FAE000 in log
file 456, segment 173, offset 16441344
2012-06-25 11:30:59 PDT LOG: streaming replication successfully
connected to primary
...
2012-06-25 11:36:26 PDT FATAL: could not send data to WAL stream: SSL
error: sslv3 alert unexpected message
2012-06-25 11:36:26 PDT LOG: invalid magic number 0000 in log file 457,
segment 173, offset 15851520
...
2012-06-25 11:36:41 PDT LOG: streaming replication successfully
connected to primary
...

Any advice on what this is telling me? I'm not keen on words like
"FATAL" in my logs.

Cheers,
Steve

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Craig Ringer 2012-06-26 01:37:07 Re: pgstat wait timeout : permission denied
Previous Message David Kerr 2012-06-25 23:21:09 Re: Calculating Replication Lag - units