What am I doing wrong?

From: François Beausoleil <francois(at)teksol(dot)info>
To: Postgres List <pgsql-general(at)postgresql(dot)org>
Subject: What am I doing wrong?
Date: 2012-09-25 00:51:11
Message-ID: BD373AE1-092E-44E1-B621-E4B9349FA88E@teksol.info
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I'm in the single-slave scenario, with hot standby capabilities, meaning I want to run queries on the slave. I'm running some tests to evaluate pgbarman, on Ubuntu 11.10. I used only packaged PostgreSQL, and I'm running version "PostgreSQL 9.1.5 on x86_64-pc-linux-gnu, compiled by gcc-4.6.real (Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1, 64-bit". Both the master and the slave are running on the same host.

master/postgresql.conf

port = 5432
archive_mode = on
wal_level = hot_standby
max_wal_senders = 3
wal_keep_segments = 256
archive_command = '/bin/cp --verbose %p /var/pgexchange/%f'

master/pg_hba.conf (as I said, testing config only):

host replication postgres 127.0.0.1/32 trust

slave/postgrseql.conf:
port = 5433
hot_standby = on
hot_standby_feedback = on
max_standby_archive_delay = -1
max_standby_streaming_delay = -1

slave/pg_hba.conf -- all at default

/var/lib/postgresql/9.1/slave0/recovery.conf:

standby_mode = on
restore_command = '/bin/cp --verbose /var/pgexchange/%f %p'
primary_conninfo = 'host=localhost port=5432 user=postgres password=supersecretpassword'

The slave's log says it's connected to the master, but I can't connect.

# psql -h localhost -p 5433 -U postgres -d mydb
psql: FATAL: the database system is starting up
FATAL: the database system is starting up

The slave's log, after a fresh pg_basebackup + restore for the slave, contains:

==> /var/log/postgresql/postgresql-9.1-slave0.log <==
2012-09-25 00:46:22 UTC LOG: database system was interrupted; last known up at 2012-09-25 00:44:20 UTC
2012-09-25 00:46:22 UTC LOG: creating missing WAL directory "pg_xlog/archive_status"
2012-09-25 00:46:22 UTC LOG: entering standby mode
`/var/pgexchange/000000010000000000000016' -> `pg_xlog/RECOVERYXLOG'
2012-09-25 00:46:22 UTC LOG: restored log file "000000010000000000000016" from archive
2012-09-25 00:46:23 UTC LOG: redo starts at 0/16000020
2012-09-25 00:46:23 UTC LOG: consistent recovery state reached at 0/17000000
/bin/cp: cannot stat `/var/pgexchange/000000010000000000000017': No such file or directory
2012-09-25 00:46:23 UTC LOG: incomplete startup packet
2012-09-25 00:46:23 UTC LOG: streaming replication successfully connected to primary
2012-09-25 00:46:23 UTC FATAL: the database system is starting up
2012-09-25 00:46:24 UTC FATAL: the database system is starting up
2012-09-25 00:46:24 UTC FATAL: the database system is starting up

The "system is starting up" are the result of the pg_ctlcluster script which attempts to connect to the database to check if the server's up and available. According to the log above, a consistent state is reached, and the slave connects to the primary. During the slave's reconnection, the master emits no messages.

On the master, pg_stat_replication looks fine:

# select * from pg_stat_replication ;
procpid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | state | sent_location | write_location | flush_location | replay_location | sync_priority | sync_state
---------+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+-----------+---------------+----------------+----------------+-----------------+---------------+------------
27920 | 10 | postgres | walreceiver | 127.0.0.1 | | 52193 | 2012-09-25 00:46:23.100631+00 | streaming | 0/17000000 | 0/17000000 | 0/17000000 | 0/17000000 | 0 | async

state == streaming; sent == write == flush == replay, so the slave seems to be consistent.

What am I missing here?

Thanks!
François

Browse pgsql-general by date

  From Date Subject
Next Message Robert James 2012-09-25 01:23:49 Re: Running CREATE only on certain Postgres versions
Previous Message Rachel Owsley 2012-09-24 20:39:47 Re: N-tile function in postgres