From: | Stephen Harris <lists(at)spuddy(dot)org> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Shutting down a warm standby database in 8.2beta3 |
Date: | 2006-11-22 18:56:23 |
Message-ID: | 20061122185623.GA23202@pugwash.spuddy.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-hackers |
On Mon, Nov 20, 2006 at 11:20:41AM -0500, Tom Lane wrote:
>
> kill(child_pid, SIGxxx);
> #ifdef HAVE_SETSID
> kill(-child_pid, SIGxxx);
> #endif
>
> In the normal case where the child has already completed setsid(), the
> extra signal sent to it should do no harm. In the startup race
Hmm. It looks like something more than this may be needed. The postgres
recovery process appears to be ignoring it. I ran the whole database
in it's own process group (ksh runs processes in their own process group
by default, so pg_ctl became the session leader and so everything under
pg_ctl all stayed in that process group).
% ps -o pid,ppid,pgid,args -g 29141 | sort
PID PPID PGID COMMAND
29145 1 29141 /local/apps/postgres/8.2.b3.0/solaris/bin/postgres
29146 29145 29141 /local/apps/postgres/8.2.b3.0/solaris/bin/postgres
29147 29145 29141 /local/apps/postgres/8.2.b3.0/solaris/bin/postgres
29501 29147 29141 sh -c /export/home/swharris/rr 000000010000000100000057 pg_xlog/RECOVERYXLOG
29502 29501 29141 /bin/ksh -p /export/home/swharris/rr 000000010000000100000057 pg_xlog/RECOVERYX
29537 29502 29141 sleep 5
I did
kill -QUIT -29141 ; sleep 1 ; touch /export/home/swharris/archives/STOP_SWEH_RECOVERY
This sent the QUIT signal to all those processes. The shell script ignores
it and so tries to start again, so the 'touch' command tells it to exit(1)
rather than loop again.
The log file (the timestamp entries are from my 'rr' program so I
can see what it's doing)...
To start with we see a normal recovery:
Wed Nov 22 13:41:20 EST 2006: Attempting to restore 000000010000000100000056
Wed Nov 22 13:41:25 EST 2006: Finished 000000010000000100000056
LOG: restored log file "000000010000000100000056" from archive
Wed Nov 22 13:41:25 EST 2006: Attempting to restore 000000010000000100000057
Wed Nov 22 13:41:25 EST 2006: Waiting for file to become available
Now I send the kill signal...
LOG: received immediate shutdown request
We can see that the sleep process got it!
/export/home/swharris/rr[37]: 29537 Quit(coredump)
And my script detects the trigger file
Wed Nov 22 13:43:51 EST 2006: End of recovery trigger file found
Now database recovery appears to continue as normal; the postgres
recovery processes are still running, despite having received SIGQUIT
LOG: could not open file "pg_xlog/000000010000000100000057" (log file 1, segment 87): No such file or directory
LOG: redo done at 1/56000070
Wed Nov 22 13:43:51 EST 2006: Attempting to restore 000000010000000100000056
Wed Nov 22 13:43:55 EST 2006: Finished 000000010000000100000056
LOG: restored log file "000000010000000100000056" from archive
LOG: archive recovery complete
LOG: database system is ready
LOG: logger shutting down
pg_xlog now contains 000000010000000100000056 and 000000010000000100000057
A similar sort of thing happens if I use SIGTERM rather than SIGQUIT
I'm out of here in an hour, so for all you US based people, have a good
Thanksgiving holiday!
--
rgds
Stephen
From | Date | Subject | |
---|---|---|---|
Next Message | Bob Pawley | 2006-11-22 19:24:16 | Re: Uninstalling PostgreSql |
Previous Message | Brandon Aiken | 2006-11-22 18:55:55 | Re: MSSQL to PostgreSQL : Encoding problem |
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Dunstan | 2006-11-22 18:58:34 | Re: Integrating Replication into Core |
Previous Message | Markus Schiltknecht | 2006-11-22 18:56:03 | Re: Integrating Replication into Core |