Re: 7.0.2 dies when connection dropped mid-transaction

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alfred Perlstein <bright(at)wintelcom(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: 7.0.2 dies when connection dropped mid-transaction
Date: 2000-11-10 02:30:30
Message-ID: 2925.973823430@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I said:
> OK, after digging some more, it seems that the critical requirement
> is that the cursor's query contain a hash join.

Here's the deal:

test7=# set enable_mergejoin to off;
SET VARIABLE
test7=# begin;
BEGIN
-- I've previously checked that this produces a hash join plan:
test7=# declare c cursor for select * from foo t1, foo t2 where t1.f1=t2.f1;
SELECT
test7=# fetch 1 from c;
f1 | f1
----+----
1 | 1
(1 row)

test7=# abort;
NOTICE: trying to delete portal name that does not exist.
pqReadData() -- backend closed the channel unexpectedly.
This probably means the backend terminated abnormally
before or while processing the request.

This happens with either 7.0.2 or 7.0.3 (probably with anything back to
6.5, if not before). It does *not* happen with current development tip.

The problem is that two "portal" structures are used. One holds the
overall query plan and execution state for the cursor, and the other
holds the hash table for the hash join. During abort, the portal
manager tries to delete both of them. BUT: deleting the query plan
causes query cleanup to be executed, which among other things deletes
the hash join's table. Then the portal manager tries to delete the
already-deleted second portal, which leads first to the above notice
and then to Assert failure (and probably would lead to coredump if
you didn't have Asserts on). Alternatively, it might try to delete
the hash join portal first, which would leave the query cleanup code
deleting an already-deleted portal, and doubtless still crashing.

Current sources don't show the problem because hashtables aren't kept
in portals anymore.

I've thought for some time that CollectNamedPortals is a horrid kluge,
and really ought to be rewritten. Hadn't seen it actually do the wrong
thing before, but now...

I guess the immediate question is do we want to hold up 7.0.3 release
for a fix? This bug is clearly ancient, so I'm not sure it's
appropriate to go through a fire drill to fix it for 7.0.3.
Comments?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2000-11-10 02:33:02 Re: Unhappy thoughts about pg_dump and objects inherited from template1
Previous Message Tom Lane 2000-11-10 01:45:27 Re: 7.0.2 dies when connection dropped mid-transaction