From: | Greg Smith <greg(at)2ndQuadrant(dot)com> |
---|---|
To: | Greg Stark <stark(at)mit(dot)edu> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: buffer assertion tripping under repeat pgbench load |
Date: | 2013-01-13 05:34:07 |
Message-ID: | 50F2474F.5040204@2ndQuadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 12/26/12 7:23 PM, Greg Stark wrote:
> It's also possible it's a bad cpu, not bad memory. If it affects
> decrement or increment in particular it's possible that the pattern of
> usage on LocalRefCount is particularly prone to triggering it.
This looks to be the winning answer. It turns out that under extended
multi-hour loads at high concurrency, something related to CPU
overheating was occasionally flipping a bit. One round of compressed
air for all the fans/vents, a little tweaking of the fan controls, and
now the system goes >24 hours with no problems.
Sorry about all the noise over this. I do think the improved warning
messages that came out of the diagnosis ideas are useful. The reworked
code must slows down the checking a few cycles, but if you care about
performance these assertions are tacked onto the biggest pig around.
I added the patch to the January CF as "Improve buffer refcount leak
warning messages". The sample I showed with the patch submission was a
simulated one. Here's the output from the last crash before resolving
the issue, where the assertion really triggered:
WARNING: buffer refcount leak: [170583] (rel=base/16384/16578,
blockNum=302295, flags=0x106, refcount=0 1073741824)
WARNING: buffers with non-zero refcount is 1
TRAP: FailedAssertion("!(RefCountErrors == 0)", File: "bufmgr.c", Line:
1712)
--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com
From | Date | Subject | |
---|---|---|---|
Next Message | Greg Smith | 2013-01-13 06:05:04 | Re: Enabling Checksums |
Previous Message | Amit kapila | 2013-01-13 04:49:38 | Re: Proposal for Allow postgresql.conf values to be changed via SQL [review] |