From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
Cc: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Archiver not exiting upon crash |
Date: | 2012-05-24 19:31:13 |
Message-ID: | 28714.1337887873@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
> On Wed, May 23, 2012 at 2:21 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> However, I remain unsatisfied with this idea as an explanation for the
>> behavior you're seeing. In the first place, that race condition window
>> ought not be wide enough to allow failure probabilities as high as 10%.
>> In the second place, that code has been like that for a long while,
>> so this theory absolutely does not explain why you're seeing a
>> materially higher probability of failure in HEAD than 9.1. There is
>> something else going on.
> After a while trying to bisect the behavior, I decided it was a mug's
> game. Both arms of the race (the firing of archive_command and the
> engineered crash) are triggered indirectly be the same event, the
> start of a checkpoint. Small changes in the code can lead to small
> changes in the timing which make drastic changes in how likely it is
> that the two arms collide exactly at the vulnerability.
Ah. OK, that sounds more plausible than "it just happened".
> So my test harness is an inexplicably effective show-case for the
> vulnerability, but it is not the reason the vulnerability should be
> fixed.
Agreed.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Merlin Moncure | 2012-05-24 19:46:37 | Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile |
Previous Message | Jeff Janes | 2012-05-24 19:26:50 | Re: Archiver not exiting upon crash |