From: | Stephen Frost <sfrost(at)snowman(dot)net> |
---|---|
To: | Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | pg_rewind test race condition..? |
Date: | 2015-04-28 18:02:54 |
Message-ID: | 20150428180253.GU30322@tamriel.snowman.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Heikki,
Not sure if anyone else is seeing this, but I'm getting regression
test failures when running the pg_rewind tests pretty consistently
with 'make check'. Specifically with "basic remote", I'm getting:
source and target cluster are on the same timeline
Failure, exiting
in regress_log/pg_rewind_log_basic_remote.
If I throw a "sleep(5);" into t/001_basic.pl before the call to
RewindTest::run_pg_rewind($test_mode); then everything works fine.
./configure --silent --prefix="$INSTALL" --with-openssl --with-tcl
--with-tclconfig=/usr/lib/tcl8.6 --with-perl --enable-debug
--enable-cassert --enable-tap-tests --with-gssapi
Full unsuccessful pg_rewind_log_basic_remote output:
-----------------------------------------------------------------
waiting for server to start....LOG: database system was shut down at 2015-04-28 13:46:34 EDT
LOG: database system is ready to accept connections
done
server started
waiting for server to start....LOG: database system was interrupted; last known up at 2015-04-28 13:46:35 EDT
LOG: entering standby mode
LOG: redo starts at 0/2000028
LOG: consistent recovery state reached at 0/20000F8
LOG: database system is ready to accept read only connections
LOG: started streaming WAL from primary at 0/3000000 on timeline 1
done
server started
server promoting
LOG: received promote request
FATAL: terminating walreceiver process due to administrator command
LOG: invalid record length at 0/30EB410
LOG: redo done at 0/30EB3A0
LOG: last completed transaction was at log time 2015-04-28 13:46:37.540952-04
LOG: selected new timeline ID: 2
LOG: archive recovery complete
LOG: database system is ready to accept connections
waiting for server to shut down....LOG: received fast shutdown request
LOG: aborting any active transactions
LOG: shutting down
LOG: database system is shut down
done
server stopped
source and target cluster are on the same timeline
Failure, exiting
waiting for server to start....LOG: database system was shut down at 2015-04-28 13:46:39 EDT
LOG: entering standby mode
LOG: consistent recovery state reached at 0/3311438
LOG: invalid record length at 0/3311438
LOG: database system is ready to accept read only connections
LOG: fetching timeline history file for timeline 2 from primary server
LOG: started streaming WAL from primary at 0/3000000 on timeline 1
LOG: replication terminated by primary server
DETAIL: End of WAL reached on timeline 1 at 0/30EB410.
LOG: new timeline 2 forked off current database system timeline 1 before current recovery point 0/3311438
LOG: restarted WAL streaming at 0/3000000 on timeline 1
LOG: replication terminated by primary server
DETAIL: End of WAL reached on timeline 1 at 0/30EB410.
LOG: new timeline 2 forked off current database system timeline 1 before current recovery point 0/3311438
done
server started
LOG: received immediate shutdown request
LOG: received immediate shutdown request
-----------------------------------------------------------------
Full successful pg_rewind_log_basic_remote output:
-----------------------------------------------------------------
waiting for server to start....LOG: database system was shut down at 2015-04-28 13:54:04 EDT
LOG: database system is ready to accept connections
done
server started
waiting for server to start....LOG: database system was interrupted; last known up at 2015-04-28 13:54:05 EDT
LOG: entering standby mode
LOG: redo starts at 0/2000028
LOG: consistent recovery state reached at 0/20000F8
LOG: database system is ready to accept read only connections
LOG: started streaming WAL from primary at 0/3000000 on timeline 1
done
server started
server promoting
LOG: received promote request
FATAL: terminating walreceiver process due to administrator command
LOG: invalid record length at 0/30EB410
LOG: redo done at 0/30EB3A0
LOG: last completed transaction was at log time 2015-04-28 13:54:07.547533-04
LOG: selected new timeline ID: 2
LOG: archive recovery complete
LOG: database system is ready to accept connections
waiting for server to shut down....LOG: received fast shutdown request
LOG: aborting any active transactions
LOG: shutting down
LOG: database system is shut down
done
server stopped
The servers diverged at WAL position 0/30EB410 on timeline 1.
Rewinding from last common checkpoint at 0/30EB3A0 on timeline 1
Done!
waiting for server to start....LOG: database system was interrupted while in recovery at log time 2015-04-28 13:54:08 EDT
HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.
LOG: entering standby mode
LOG: redo starts at 0/30EB368
LOG: consistent recovery state reached at 0/3113788
LOG: invalid record length at 0/3113788
LOG: database system is ready to accept read only connections
LOG: started streaming WAL from primary at 0/3000000 on timeline 2
done
server started
LOG: received immediate shutdown request
LOG: unexpected EOF on standby connection
LOG: received immediate shutdown request
-----------------------------------------------------------------
Thanks!
Stephen
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2015-04-28 18:05:08 | Re: Freeze avoidance of very large table. |
Previous Message | Pavel Stehule | 2015-04-28 17:58:15 | Re: cache invalidation for PL/pgsql functions |