From: | Alfred Perlstein <bright(at)wintelcom(dot)net> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: 7.0.2 dies when connection dropped mid-transaction |
Date: | 2000-11-10 02:43:24 |
Message-ID: | 20001109184324.L11449@fw.wintelcom.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
* Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> [001109 18:30] wrote:
> I said:
> > OK, after digging some more, it seems that the critical requirement
> > is that the cursor's query contain a hash join.
>
> Here's the deal:
>
> test7=# set enable_mergejoin to off;
> SET VARIABLE
> test7=# begin;
> BEGIN
> -- I've previously checked that this produces a hash join plan:
> test7=# declare c cursor for select * from foo t1, foo t2 where t1.f1=t2.f1;
> SELECT
> test7=# fetch 1 from c;
> f1 | f1
> ----+----
> 1 | 1
> (1 row)
>
> test7=# abort;
> NOTICE: trying to delete portal name that does not exist.
> pqReadData() -- backend closed the channel unexpectedly.
> This probably means the backend terminated abnormally
> before or while processing the request.
>
> This happens with either 7.0.2 or 7.0.3 (probably with anything back to
> 6.5, if not before). It does *not* happen with current development tip.
>
> The problem is that two "portal" structures are used. One holds the
> overall query plan and execution state for the cursor, and the other
> holds the hash table for the hash join. During abort, the portal
> manager tries to delete both of them. BUT: deleting the query plan
> causes query cleanup to be executed, which among other things deletes
> the hash join's table. Then the portal manager tries to delete the
> already-deleted second portal, which leads first to the above notice
> and then to Assert failure (and probably would lead to coredump if
> you didn't have Asserts on). Alternatively, it might try to delete
> the hash join portal first, which would leave the query cleanup code
> deleting an already-deleted portal, and doubtless still crashing.
>
> Current sources don't show the problem because hashtables aren't kept
> in portals anymore.
>
> I've thought for some time that CollectNamedPortals is a horrid kluge,
> and really ought to be rewritten. Hadn't seen it actually do the wrong
> thing before, but now...
>
> I guess the immediate question is do we want to hold up 7.0.3 release
> for a fix? This bug is clearly ancient, so I'm not sure it's
> appropriate to go through a fire drill to fix it for 7.0.3.
> Comments?
I dunno, having the database crash because a errant client disconnected
without shutting down, or needed to abort a transaction looks like
a show stopper.
We do track CVS and wouldn't have a problem shifting to 7_0_3_PATCHES,
but I'm not sure if the rest of the userbase is going to have much
fun.
It seems to be a serious problem, I think people wouldn't mind
waiting for you to squash this one.
--
-Alfred Perlstein - [bright(at)wintelcom(dot)net|alfred(at)freebsd(dot)org]
"I have the heart of a child; I keep it in a jar on my desk."
From | Date | Subject | |
---|---|---|---|
Next Message | Larry Rosenman | 2000-11-10 02:44:45 | Re: Summary: what to do about INET/CIDR |
Previous Message | Tom Lane | 2000-11-10 02:43:06 | Re: Re: Recursive use of syscaches (was: relation ### modified while in use) |