| From: | Michael Paquier <michael(dot)paquier(at)gmail(dot)com> | 
|---|---|
| To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> | 
| Cc: | Peter Eisentraut <peter_e(at)gmx(dot)net>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org> | 
| Subject: | Re: The real reason why TAP testing isn't ready for prime time | 
| Date: | 2015-06-19 21:52:41 | 
| Message-ID: | CAB7nPqQhhb5ZNtq7MazbTatMx-Vovd3hEAaAr2Km7rDAhW_yQQ@mail.gmail.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
On Sat, Jun 20, 2015 at 12:07 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Michael Paquier <michael(dot)paquier(at)gmail(dot)com> writes:
>> Now if we look at RewindTest.pm, there is the following code:
>>         if ($test_master_datadir)
>>         {
>>                 system
>>                   "pg_ctl -D $test_master_datadir -s -m immediate stop
>> 2> /dev/null";
>>         }
>>         if ($test_standby_datadir)
>>         {
>>                 system
>>                   "pg_ctl -D $test_standby_datadir -s -m immediate
>> stop 2> /dev/null";
>>         }
>> And I think that the problem is triggered because we are missing a -w
>> switch here, meaning that we do not wait until the confirmation that
>> the server has stopped, and visibly if stop is slow enough the next
>> server to use cannot start because the port is already taken by the
>> server currently stopping.
>
> After I woke up a bit more, I remembered that -w is already the default
> for "pg_ctl stop", so your diagnosis here is incorrect.
Ah right. I forgot that. Perhaps I got just lucky in my runs.
> I suspect that the real problem is the arbitrary decision to use -m
> immediate.  The postmaster would ordinarily wait for its children to
> die, but on a slow machine we could perhaps reach the end of that
> 5-second timeout, whereupon the postmaster would SIGKILL its children
> *and exit immediately*.  I'm not sure how instantaneous SIGKILL is,
> but it seems possible that we could end up trying to start the new
> postmaster before all the children of the old one are dead.  If the
> shmem interlock is working properly that ought to fail.
>
> I wonder whether it's such a good idea for the postmaster to give
> up waiting before all children are gone (postmaster.c:1722 in HEAD).
I don't think so as well.
-- 
Michael
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Michael Paquier | 2015-06-19 21:53:51 | Re: The real reason why TAP testing isn't ready for prime time | 
| Previous Message | Alvaro Herrera | 2015-06-19 21:52:31 | Re: Need Multixact Freezing Docs |