From: | Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Streaming replication - unable to stop the standby |
Date: | 2010-05-03 18:22:16 |
Message-ID: | 4BDF1458.1040807@kaltenbrunner.cc |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Tom Lane wrote:
> Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc> writes:
>> I'm currently testing SR/HS in 9.0beta1 and I noticed that it seems
>> quite easy to end up in a situation where you have a standby that seems
>> to be stuck in:
>
>> $ psql -p 5433
>> psql: FATAL: the database system is shutting down
>
>> but not not actually shuting down ever. I ran into that a few times now
>> (mostly because I'm trying to chase a recovery issue I hit during
>> earlier testing) by simply having the master iterate between a pgbench
>> run and "idle" while simple doing pg_ctl restart in a loop on the standby.
>> I do vaguely recall some discussions of that but I thought the issue git
>> settled somehow?
>
> Hm, I haven't pushed this hard but "pg_ctl stop" seems to stop the
> standby for me. Which subprocesses of the slave postmaster are still
> around? Could you attach to them with gdb and get stack traces?
it is not always failing to shut down - it only fails sometimes - I have
not exactly pinpointed yet what it is causing this but the standby is in
a weird state now:
* the master is currently idle
* the standby has no connections at all
logs from the standby:
FATAL: the database system is shutting down
FATAL: the database system is shutting down
FATAL: replication terminated by primary server
LOG: restored log file "000000010000001900000054" from archive
cp: cannot stat `/mnt/space/wal-archive/000000010000001900000055': No
such file or directory
LOG: record with zero length at 19/55000078
cp: cannot stat `/mnt/space/wal-archive/000000010000001900000055': No
such file or directory
FATAL: could not connect to the primary server: could not connect to
server: Connection refused
Is the server running on host "localhost" and accepting
TCP/IP connections on port 5432?
could not connect to server: Connection refused
Is the server running on host "localhost" and accepting
TCP/IP connections on port 5432?
cp: cannot stat `/mnt/space/wal-archive/000000010000001900000055': No
such file or directory
cp: cannot stat `/mnt/space/wal-archive/000000010000001900000055': No
such file or directory
LOG: streaming replication successfully connected to primary
FATAL: the database system is shutting down
the first two "FATAL: the database system is shutting down" are from me
trying to connect using psql after i noticed that pg_ctl failed to
shutdown the slave.
The next thing I tried was restarting the master - which lead to the
following logs and the standby noticing that and reconnecting but you
cannot actually connect...
process tree for the standby is:
29523 pts/2 S 0:00 /home/postgres9/pginst/bin/postgres -D
/mnt/space/pgdata_standby
29524 ? Ss 0:06 \_ postgres: startup process waiting for
000000010000001900000055
29529 ? Ss 0:00 \_ postgres: writer process
29835 ? Ss 0:00 \_ postgres: wal receiver process
streaming 19/55000078
Stefan
From | Date | Subject | |
---|---|---|---|
Next Message | Simon Riggs | 2010-05-03 18:22:31 | Re: max_standby_delay considered harmful |
Previous Message | Robert Haas | 2010-05-03 18:17:41 | Re: Streaming replication - unable to stop the standby |