From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | r(dot)zharkov(at)postgrespro(dot)ru, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #15727: PANIC: cannot abort transaction 295144144, it was already committed |
Date: | 2019-04-06 17:10:25 |
Message-ID: | 20190406171025.x7mbhp6kct75oqny@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Hi,
On 2019-04-06 09:28:46 -0700, Andres Freund wrote:
> On 2019-04-06 12:23:06 -0400, Tom Lane wrote:
> > It seems that there may be some connection between this problem and
> > EPQ. I was working on committing Amit's fix for bug #15677, which
> > demonstrated that EPQ doesn't work for partitioned-table target rels.
> > It seemed like there really needed to be regression test coverage for
> > that, so I tried to convert his crasher example into an isolation test.
> > It does indeed crash without Amit's fix ... but with it, lookee what
> > I get:
> >
> > +error in steps c1 complexpartupdate: ERROR: unexpected table_lock_tuple status: 1
> >
> > That seems fully reproducible in this test. I haven't looked into
> > exactly what's causing that, but now that we have a reproducible
> > example, somebody should.
> >
> > I'm not quite sure if I should commit this as-is or wait till the
> > other problem is fixed. A crash is probably worse than a bogus
> > error, but I don't like committing obviously-wrong "expected" output.
> > Thoughts?
>
> Let me have a look at the testcase - I'd been running Roman's testcase
> for quite a few hours without being able to reproduce. But your testcase
> seems to trigger this reliably, so I hope I can make some quick
> progress.
Hm. I see what's wrong here - the new code assumed that we couldn't get
a SelfModified because the first version of the to-be-(deleted|updated)
tuple was visible. To properly discern that from the TM_Deleted case,
I'd to change/fix heapam_lock_tuple's follow-the-update chain to return
SelfModified, rather than Invisible in this case (I don't think we want
to allow invisible - we'd have to have waited for the earlier tuple
version) - which is a more accurate return code anyway.
I'm still not understanding how that'd be possible in Roman's
case. Given the workload there never should be any self updating going
on?
Heavily-WIP patch attached.
I noticed that we say
+ ereport(ERROR,
+ (errcode(ERRCODE_TRIGGERED_DATA_CHANGE_VIOLATION),
+ errmsg("tuple to be updated was already modified by an operation triggered by the current command"),
in the ExecDelete() case (that's not new). Which seems odd.
I think my fix would need a non-partition reproducer. I'll work on that
and polishing it after having a coffee.
Greetings,
Andres Freund
Attachment | Content-Type | Size |
---|---|---|
fix-repeated-self-mod-chain.diff | text/x-diff | 3.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2019-04-06 17:17:05 | Re: BUG #15727: PANIC: cannot abort transaction 295144144, it was already committed |
Previous Message | r.zharkov | 2019-04-06 17:09:15 | Re: BUG #15727: PANIC: cannot abort transaction 295144144, it was already committed |