Quick Links

PostgreSQL walsender process doesn't exist after "pg_ctl stop -m fast"

From:	y39chen <yan-jack(dot)chen(at)nokia(dot)com>
To:	pgsql-general(at)postgresql(dot)org
Subject:	PostgreSQL walsender process doesn't exist after "pg_ctl stop -m fast"
Date:	2017-11-13 05:25:10
Message-ID:	1510550710287-0.post@n3.nabble.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

We encounter one problem that PostgreSQL walsender process doesn't exist
after "pg_ctl stop -m fast".
Uses PostgreSQL 9.6.2
Steps:
1)active postgres server is up;
2)standby postgres intance take pg_basebackup
3)usin command "pg_ctl stop -W -m fast -D /mnt/db/DBTestPostgres/db_data" to
stop active postgres server;
4)active postgres server can't stop successfully. It hanging by walsender
process and can't exit (we wait about 10min)
5)kill -9 walsender process, it exit right now.

1. make postgres coredump, the log as bellow

Stack trace of thread 2288:
#0 0x00007f7fae6ea013 __select
(libc.so.6)
#1 0x000000000046df6b
ServerLoop (postgres)
#2 0x000000000060d528
PostmasterMain (postgres)
#3 0x000000000046f2dc main
(postgres)
#4 0x00007f7fae62a461
__libc_start_main (libc.so.6)
#5 0x000000000046f35a _start
(postgres)

Nov 10 10:08:02 mn-1 systemd-coredump[8994]: Process 2645 (postgres) of user
550 dumped core.

Stack trace of thread 2645:
#0 0x00007f7fae6f1d63
epoll_wait (libc.so.6)
#1 0x000000000063ce69
WaitEventSetWait (postgres)
#2 0x000000000063d1ab
WaitLatchOrSocket (postgres)
#3 0x0000000000622ead
WalSndLoop (postgres)
#4 0x0000000000623a8f
exec_replication_command (postgres)
#5 0x00000000006558fd
PostgresMain (postgres)
#6 0x000000000046e631
ServerLoop (postgres)
#7 0x000000000060d528
PostmasterMain (postgres)
#8 0x000000000046f2dc main
(postgres)
#9 0x00007f7fae62a461
__libc_start_main (libc.so.6)
#10 0x000000000046f35a _start
(postgres)

2. active postgres intance log:

Nov 09 12:29:53 mn-1 postgres[6073]: [3-1] DEBUG: autovacuum: processing
database "DBTestPostgres"
Nov 09 12:30:13 mn-1 postgres[6204]: [3-1] DEBUG: autovacuum: processing
database "postgreswd"
Nov 09 12:30:20 mn-1 postgres[2231]: [35-1] DEBUG: checkpoint sync:
number=1 file=base/16385/16467 time=14.990 msec
Nov 09 12:30:20 mn-1 postgres[2231]: [36-1] LOG: checkpoint complete: wrote
1065 buffers (6.5%); 0 transaction log file(s) added, 0 removed, 2 recycled;
write=59.892 s, sync=0.015 s, total=59.927 s; sync files=1, longest=0.014 s,
average=0.014 s; distance=12487 kB, estimate=19411 kB
Nov 09 12:30:20 mn-1 postgres[2231]: [37-1] LOG: checkpoint starting: force
wait
Nov 09 12:30:20 mn-1 postgres[2231]: [38-1] DEBUG: performing replication
slot checkpoint
Nov 09 12:30:21 mn-1 postgres[2231]: [39-1] DEBUG: checkpoint sync:
number=1 file=base/16385/16467 time=59.400 msec
Nov 09 12:30:21 mn-1 postgres[2231]: [40-1] LOG: checkpoint complete: wrote
1 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled;
write=1.102 s, sync=0.059 s, total=1.168 s; sync files=1, longest=0.059 s,
average=0.059 s; distance=3900 kB, estimate=17860 kB
Nov 09 12:30:23 mn-1 postgres[2235]: [28-1] DEBUG: archived transaction log
file "000000020000000000000193"
Nov 09 12:30:23 mn-1 postgres[2235]: [29-1] DEBUG: archived transaction log
file "000000020000000000000193.00000108.backup"
Nov 09 12:30:24 mn-1 postgres[6263]: [3-1] DEBUG: received replication
command: IDENTIFY_SYSTEM
Nov 09 12:30:24 mn-1 postgres[6263]: [4-1] DEBUG: received replication
command: START_REPLICATION 0/65000000 TIMELINE 2
Nov 09 12:30:24 mn-1 postgres[6263]: [5-1] DEBUG: standby "walreceiver" has
now caught up with primary

Nov 09 12:30:24 mn-1 dbim-postgres[2039]: [2039-139870943506432] ....
/opt/nokia/libexec/SS_RCPPostgress/bin/pg_ctl stop -W -m fast -D
/mnt/db/DBTestPostgres/db_data
Nov 09 12:30:24 mn-1 postgres[2157]: [3-1] LOG: received fast shutdown
request
Nov 09 12:30:24 mn-1 postgres[2157]: [4-1] LOG: aborting any active
transactions
Nov 09 12:30:24 mn-1 postgres[6073]: [4-1] err-2: terminating autovacuum
process due to administrator command
Nov 09 12:30:24 mn-1 postgres[2234]: [3-1] LOG: autovacuum launcher
shutting down
Nov 09 12:30:24 mn-1 postgres[2231]: [41-1] LOG: shutting down
......
Nov 09 12:30:44 mn-1 postgres[6408]: [5-1] err-2: the database system is
shutting down
Nov 09 12:30:46 mn-1 postgres[6411]: [5-1] err-2: the database system is
shutting down
Nov 09 12:30:48 mn-1 postgres[6415]: [5-1] err-2: the database system is
shutting down
Nov 09 12:30:50 mn-1 postgres[6434]: [5-1] err-2: the database system is
shutting down
Nov 09 12:30:52 mn-1 postgres[6442]: [5-1] err-2: the database system is
shutting down

3. slave instance log:
Nov 09 12:30:23 mn-0 postgres[11641]: [1-1] LOG: ending log output to
stderr
Nov 09 12:30:23 mn-0 dbim-postgres[11104]: LOG: ending log output to stderr
Nov 09 12:30:23 mn-0 dbim-postgres[11104]: HINT: Future log output will go
to log destination "syslog".
Nov 09 12:30:23 mn-0 postgres[11641]: [1-2] HINT: Future log output will go
to log destination "syslog".
Nov 09 12:30:23 mn-0 postgres[11642]: [2-1] LOG: database system was
interrupted; last known up at 2017-11-09 10:30:21 GMT
Nov 09 12:30:23 mn-0 postgres[11642]: [3-1] LOG: entering standby mode
Nov 09 12:30:23 mn-0 postgres[11642]: [4-1] DEBUG: checkpoint record is at
0/64C00140
Nov 09 12:30:23 mn-0 postgres[11642]: [5-1] DEBUG: redo record is at
0/64C00108; shutdown FALSE
Nov 09 12:30:23 mn-0 postgres[11642]: [6-1] DEBUG: next transaction ID:
0:464; next OID: 16473
Nov 09 12:30:23 mn-0 postgres[11642]: [7-1] DEBUG: next MultiXactId: 1;
next MultiXactOffset: 0
Nov 09 12:30:23 mn-0 postgres[11642]: [8-1] DEBUG: oldest unfrozen
transaction ID: 418, in database 1
Nov 09 12:30:23 mn-0 postgres[11642]: [9-1] DEBUG: oldest MultiXactId: 1,
in database 1
Nov 09 12:30:23 mn-0 postgres[11642]: [10-1] DEBUG: commit timestamp Xid
oldest/newest: 0/0
Nov 09 12:30:23 mn-0 postgres[11642]: [11-1] DEBUG: transaction ID wrap
limit is 2147484065, limited by database with OID 1
Nov 09 12:30:23 mn-0 postgres[11642]: [12-1] DEBUG: MultiXactId wrap limit
is 2147483648, limited by database with OID 1
Nov 09 12:30:23 mn-0 postgres[11642]: [13-1] DEBUG: starting up replication
slots
Nov 09 12:30:23 mn-0 postgres[11642]: [14-1] DEBUG: resetting unlogged
relations: cleanup 1 init 0
Nov 09 12:30:23 mn-0 postgres[11642]: [15-1] DEBUG: initializing for hot
standby
Nov 09 12:30:23 mn-0 postgres[11642]: [16-1] LOG: redo starts at 0/64C00108
Nov 09 12:30:23 mn-0 postgres[11642]: [17-1] DEBUG: recovery snapshots are
now enabled
Nov 09 12:30:23 mn-0 postgres[11642]: [17-2] CONTEXT: xlog redo at
0/64C00108 for Standby/RUNNING_XACTS: nextXid 464 latestCompletedXid 463
oldestRunningXid 464
Nov 09 12:30:23 mn-0 postgres[11642]: [18-1] DEBUG: end of backup reached
Nov 09 12:30:23 mn-0 postgres[11642]: [18-2] CONTEXT: xlog redo at
0/64C001B0 for XLOG/BACKUP_END: 0/64C00108
Nov 09 12:30:23 mn-0 postgres[11642]: [19-1] LOG: consistent recovery state
reached at 0/64C001D8
Nov 09 12:30:23 mn-0 postgres[11641]: [2-1] LOG: database system is ready
to accept read only connections
Nov 09 12:30:23 mn-0 postgres[11650]: [3-1] LOG: started streaming WAL from
primary at 0/65000000 on timeline 2
Nov 09 12:30:24 mn-0 postgres[11642]: [20-1] DEBUG: transaction ID wrap
limit is 2147484065, limited by database with OID 1
Nov 09 12:30:24 mn-0 postgres[11642]: [20-2] CONTEXT: xlog redo at
0/65400028 for XLOG/CHECKPOINT_SHUTDOWN: redo 0/65400028; tli 2; prev tli 2;
fpw true; xid 0:464; oid 16473; multi 1; offset 0; oldest xid 418 in DB 1;
oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest
running xid 0; shutdown
Nov 09 12:30:24 mn-0 postgres[11650]: [4-1] LOG: replication terminated by
primary server
Nov 09 12:30:24 mn-0 postgres[11650]: [4-2] DETAIL: End of WAL reached on
timeline 2 at 0/65400098.
Nov 09 12:30:24 mn-0 postgres[11650]: [5-1] err-2: could not send
end-of-streaming message to primary: no COPY in progress
Nov 09 12:30:24 mn-0 postgres[11650]: [5-2]
Nov 09 12:30:24 mn-0 postgres[11642]: [21-1] LOG: invalid record length at
0/65400098: wanted 24, got 0
Nov 09 12:30:24 mn-0 postgres[11701]: [3-1] err-2: could not connect to the
primary server: err-2: the database system is shutting down
Nov 09 12:30:24 mn-0 postgres[11701]: [3-2]
Nov 09 12:30:26 mn-0 postgres[11642]: [22-1] DEBUG: invalid record length
at 0/65400098: wanted 24, got 0
Nov 09 12:30:26 mn-0 postgres[11720]: [3-1] err-2: could not connect to the
primary server: err-2: the database system is shutting down
Nov 09 12:30:26 mn-0 postgres[11720]: [3-2]
Nov 09 12:30:28 mn-0 postgres[11642]: [23-1] DEBUG: invalid record length
at 0/65400098: wanted 24, got 0
Nov 09 12:30:28 mn-0 postgres[11739]: [3-1] err-2: could not connect to the
primary server: err-2: the database system is shutting down
Nov 09 12:30:28 mn-0 postgres[11739]: [3-2]
Nov 09 12:30:30 mn-0 postgres[11642]: [24-1] DEBUG: invalid record length
at 0/65400098: wanted 24, got 0
Nov 09 12:30:30 mn-0 postgres[11767]: [3-1] err-2: could not connect to the
primary server: err-2: the database system is shutting down
Nov 09 12:30:30 mn-0 postgres[11767]: [3-2]
Nov 09 12:30:32 mn-0 postgres[11642]: [25-1] DEBUG: invalid record length
at 0/65400098: wanted 24, got 0
Nov 09 12:30:32 mn-0 postgres[11770]: [3-1] err-2: could not connect to the
primary server: err-2: the database system is shutting down

Nov 09 12:30:54 mn-0 postgres[11642]: [36-1] DEBUG: invalid record length
at 0/65400098: wanted 24, got 0
Nov 09 12:30:54 mn-0 postgres[12088]: [3-1] err-2: could not connect to the
primary server: err-2: the database system is shutting down
Nov 09 12:30:54 mn-0 postgres[12088]: [3-2]

--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html

Responses

Re: PostgreSQL walsender process doesn't exist after "pg_ctl stop -m fast" at 2017-11-13 15:05:09 from Tom Lane

Browse pgsql-general by date

	From	Date	Subject
Next Message	Andreas Joseph Krogh	2017-11-13 08:27:33	PG-10 + ICU and abbreviated keys
Previous Message	John R Pierce	2017-11-13 03:48:27	Re: Migrating plattaform