From: | Satoshi Nagayasu <snaga(at)uptime(dot)jp> |
---|---|
To: | Greg Smith <greg(at)2ndquadrant(dot)com> |
Cc: | Greg Stark <stark(at)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: buffer assertion tripping under repeat pgbench load |
Date: | 2013-01-27 07:32:58 |
Message-ID: | CAA8sozfi0Gn2ixcQg=U947O-f26yQdDh5487c_29Z5kP+12iQQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
I just reviewed this patch.
https://commitfest.postgresql.org/action/patch_view?id=1035
2013/1/13 Greg Smith <greg(at)2ndquadrant(dot)com>:
> On 12/26/12 7:23 PM, Greg Stark wrote:
>>
>> It's also possible it's a bad cpu, not bad memory. If it affects
>> decrement or increment in particular it's possible that the pattern of
>> usage on LocalRefCount is particularly prone to triggering it.
>
>
> This looks to be the winning answer. It turns out that under extended
> multi-hour loads at high concurrency, something related to CPU overheating
> was occasionally flipping a bit. One round of compressed air for all the
> fans/vents, a little tweaking of the fan controls, and now the system goes
>>24 hours with no problems.
>
> Sorry about all the noise over this. I do think the improved warning
> messages that came out of the diagnosis ideas are useful. The reworked code
> must slows down the checking a few cycles, but if you care about performance
> these assertions are tacked onto the biggest pig around.
>
> I added the patch to the January CF as "Improve buffer refcount leak warning
> messages". The sample I showed with the patch submission was a simulated
> one. Here's the output from the last crash before resolving the issue,
> where the assertion really triggered:
>
> WARNING: buffer refcount leak: [170583] (rel=base/16384/16578,
> blockNum=302295, flags=0x106, refcount=0 1073741824)
>
> WARNING: buffers with non-zero refcount is 1
> TRAP: FailedAssertion("!(RefCountErrors == 0)", File: "bufmgr.c", Line:
> 1712)
This patch is intended to improve warning message at
AtEOXact_Buffers(), but I guess another function,
AtProcExit_Buffers(), needs to be modified as well. Right?
With this additional fix, the patch could be applied to the
current git master, and could be compiled with --enable-cassert
option.
Then, I need some suggestion from hackers to continue this review.
How should I reproduce this message for review?
This is a debug warning message, so it's not easy for me
to reproduce this message.
Any suggestion?
--
Satoshi Nagayasu <snaga(at)uptime(dot)jp>
Uptime Technologies, LLC http://www.uptime.jp/
From | Date | Subject | |
---|---|---|---|
Next Message | Pavel Stehule | 2013-01-27 07:39:03 | Re: enhanced error fields |
Previous Message | Pavel Stehule | 2013-01-27 07:32:07 | Re: enhanced error fields |