From: | Andres Freund <andres(at)2ndquadrant(dot)com> |
---|---|
To: | Merlin Moncure <mmoncure(at)gmail(dot)com> |
Cc: | Peter Geoghegan <pg(at)heroku(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: hung backends stuck in spinlock heavy endless loop |
Date: | 2015-01-16 14:22:27 |
Message-ID: | 20150116142227.GF16991@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2015-01-16 08:05:07 -0600, Merlin Moncure wrote:
> On Thu, Jan 15, 2015 at 5:10 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> > On Thu, Jan 15, 2015 at 3:00 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
> >> Running this test on another set of hardware to verify -- if this
> >> turns out to be a false alarm which it may very well be, I can only
> >> offer my apologies! I've never had a new drive fail like that, in
> >> that manner. I'll burn the other hardware in overnight and report
> >> back.
>
> huh -- well possibly. not. This is on a virtual machine attached to a
> SAN. It ran clean for several (this is 9.4 vanilla, asserts off,
> checksums on) hours then the starting having issues:
Damn.
Is there any chance you can package this somehow so that others can run
it locally? It looks hard to find the actual bug here without adding
instrumentation to to postgres.
> [cds2 21952 2015-01-15 22:54:51.833 CST 5502]WARNING: page
> verification failed, calculated checksum 59143 but expected 59137 at
> character 20
> [cds2 21952 2015-01-15 22:54:51.852 CST 5502]QUERY:
> DELETE FROM "onesitepmc"."propertyguestcard" t
> WHERE EXISTS
> (
> SELECT 1 FROM "propertyguestcard_insert" d
> WHERE (t."prptyid", t."gcardid") = (d."prptyid", d."gcardid")
> )
> [cds2 21952 2015-01-15 22:54:51.852 CST 5502]CONTEXT: PL/pgSQL
> function cdsreconciletable(text,text,text,text,boolean) line 197 at
> EXECUTE statement
> SQL statement "SELECT * FROM CDSReconcileTable(
> t.CDSServer,
> t.CDSDatabase,
> t.SchemaName,
> t.TableName)"
> PL/pgSQL function cdsreconcileruntable(bigint) line 35 at SQL statement
This was the first error? None of the 'could not find subXID' errors
beforehand?
> [cds2 32353 2015-01-16 04:40:57.814 CST 7549]WARNING: did not find
> subXID 7553 in MyProc
> [cds2 32353 2015-01-16 04:40:57.814 CST 7549]CONTEXT: PL/pgSQL
> function cdsreconcileruntable(bigint) line 35 during exception cleanup
> [cds2 32353 2015-01-16 04:40:58.018 CST 7549]WARNING: you don't own a
> lock of type AccessShareLock
> [cds2 32353 2015-01-16 04:40:58.018 CST 7549]CONTEXT: PL/pgSQL
> function cdsreconcileruntable(bigint) line 35 during exception cleanup
> [cds2 32353 2015-01-16 04:40:58.026 CST 7549]LOG: could not send data
> to client: Broken pipe
> [cds2 32353 2015-01-16 04:40:58.026 CST 7549]CONTEXT: PL/pgSQL
> function cdsreconcileruntable(bigint) line 35 during exception cleanup
> [cds2 32353 2015-01-16 04:40:58.026 CST 7549]STATEMENT: SELECT
> CDSReconcileRunTable(1160)
> [cds2 32353 2015-01-16 04:40:58.026 CST 7549]WARNING:
> ReleaseLockIfHeld: failed??
> [cds2 32353 2015-01-16 04:40:58.026 CST 7549]CONTEXT: PL/pgSQL
> function cdsreconcileruntable(bigint) line 35 during exception cleanup
> [cds2 32353 2015-01-16 04:40:58.026 CST 7549]WARNING: you don't own a
> lock of type AccessShareLock
> [cds2 32353 2015-01-16 04:40:58.026 CST 7549]CONTEXT: PL/pgSQL
> function cdsreconcileruntable(bigint) line 35 during exception cleanup
> [cds2 32353 2015-01-16 04:40:58.026 CST 7549]WARNING:
> ReleaseLockIfHeld: failed??
> [cds2 32353 2015-01-16 04:40:58.026 CST 7549]CONTEXT: PL/pgSQL
> function cdsreconcileruntable(bigint) line 35 during exception cleanup
> [cds2 32353 2015-01-16 04:40:58.026 CST 7549]WARNING: you don't own a
> lock of type AccessShareLock
> [cds2 32353 2015-01-16 04:40:58.026 CST 7549]CONTEXT: PL/pgSQL
> function cdsreconcileruntable(bigint) line 35 during exception cleanup
> [cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING:
> ReleaseLockIfHeld: failed??
> [cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT: PL/pgSQL
> function cdsreconcileruntable(bigint) line 35 during exception cleanup
> [cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING: you don't own a
> lock of type AccessShareLock
> [cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT: PL/pgSQL
> function cdsreconcileruntable(bigint) line 35 during exception cleanup
> [cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING:
> ReleaseLockIfHeld: failed??
> [cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT: PL/pgSQL
> function cdsreconcileruntable(bigint) line 35 during exception cleanup
> [cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING: you don't own a
> lock of type ShareLock
> [cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT: PL/pgSQL
> function cdsreconcileruntable(bigint) line 35 during exception cleanup
> [cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING:
> ReleaseLockIfHeld: failed??
> [cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT: PL/pgSQL
> function cdsreconcileruntable(bigint) line 35 during exception cleanup
> [cds2 32353 2015-01-16 04:40:58.027 CST 7549]ERROR: failed to re-find
> shared lock object
> [cds2 32353 2015-01-16 04:40:58.027 CST 7549]CONTEXT: PL/pgSQL
> function cdsreconcileruntable(bigint) line 35 during exception cleanup
> [cds2 32353 2015-01-16 04:40:58.027 CST 7549]STATEMENT: SELECT
> CDSReconcileRunTable(1160)
> [cds2 32353 2015-01-16 04:40:58.027 CST 7549]WARNING:
> AbortSubTransaction while in ABORT state
This indicates a bug in our subtransaction abort handling. It looks to
me like there actually might be several. But it's probably a consequence
of an earlier bug. It's hard to diagnose the actual issue, because we're
not seing the original error(s) :(.
Could you add a EmitErrorReport(); before the FlushErrorState() in
pl_exec.c's exec_stmt_block()?
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Merlin Moncure | 2015-01-16 14:38:56 | Re: hung backends stuck in spinlock heavy endless loop |
Previous Message | Merlin Moncure | 2015-01-16 14:21:33 | Re: hung backends stuck in spinlock heavy endless loop |