Re: 7.0.2 dies when connection dropped mid-transaction

From: Alfred Perlstein <bright(at)wintelcom(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: 7.0.2 dies when connection dropped mid-transaction
Date: 2000-11-10 02:43:24
Message-ID: 20001109184324.L11449@fw.wintelcom.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

* Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> [001109 18:30] wrote:
> I said:
> > OK, after digging some more, it seems that the critical requirement
> > is that the cursor's query contain a hash join.
>
> Here's the deal:
>
> test7=# set enable_mergejoin to off;
> SET VARIABLE
> test7=# begin;
> BEGIN
> -- I've previously checked that this produces a hash join plan:
> test7=# declare c cursor for select * from foo t1, foo t2 where t1.f1=t2.f1;
> SELECT
> test7=# fetch 1 from c;
> f1 | f1
> ----+----
> 1 | 1
> (1 row)
>
> test7=# abort;
> NOTICE: trying to delete portal name that does not exist.
> pqReadData() -- backend closed the channel unexpectedly.
> This probably means the backend terminated abnormally
> before or while processing the request.
>
> This happens with either 7.0.2 or 7.0.3 (probably with anything back to
> 6.5, if not before). It does *not* happen with current development tip.
>
> The problem is that two "portal" structures are used. One holds the
> overall query plan and execution state for the cursor, and the other
> holds the hash table for the hash join. During abort, the portal
> manager tries to delete both of them. BUT: deleting the query plan
> causes query cleanup to be executed, which among other things deletes
> the hash join's table. Then the portal manager tries to delete the
> already-deleted second portal, which leads first to the above notice
> and then to Assert failure (and probably would lead to coredump if
> you didn't have Asserts on). Alternatively, it might try to delete
> the hash join portal first, which would leave the query cleanup code
> deleting an already-deleted portal, and doubtless still crashing.
>
> Current sources don't show the problem because hashtables aren't kept
> in portals anymore.
>
> I've thought for some time that CollectNamedPortals is a horrid kluge,
> and really ought to be rewritten. Hadn't seen it actually do the wrong
> thing before, but now...
>
> I guess the immediate question is do we want to hold up 7.0.3 release
> for a fix? This bug is clearly ancient, so I'm not sure it's
> appropriate to go through a fire drill to fix it for 7.0.3.
> Comments?

I dunno, having the database crash because a errant client disconnected
without shutting down, or needed to abort a transaction looks like
a show stopper.

We do track CVS and wouldn't have a problem shifting to 7_0_3_PATCHES,
but I'm not sure if the rest of the userbase is going to have much
fun.

It seems to be a serious problem, I think people wouldn't mind
waiting for you to squash this one.

--
-Alfred Perlstein - [bright(at)wintelcom(dot)net|alfred(at)freebsd(dot)org]
"I have the heart of a child; I keep it in a jar on my desk."

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Larry Rosenman 2000-11-10 02:44:45 Re: Summary: what to do about INET/CIDR
Previous Message Tom Lane 2000-11-10 02:43:06 Re: Re: Recursive use of syscaches (was: relation ### modified while in use)