From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Server crash (FailedAssertion) due to catcache refcount mis-handling |
Date: | 2017-08-08 15:36:17 |
Message-ID: | 4244.1502206577@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com> writes:
> We have observed a random server crash (FailedAssertion), while running few
> tests at our end. Stack-trace is attached.
> By looking at the stack-trace, and as discussed it with my team members;
> what we have observed that in SearchCatCacheList(), we are incrementing
> refcount and then decrementing it at the end. However for some reason, if
> we are in TRY() block (where we increment the refcount), and hit with any
> interrupt, we failed to decrement the refcount due to which later we get
> assertion failure.
Hm. So SearchCatCacheList has a PG_TRY block that is meant to release
those refcounts, but if you hit the backend with a SIGTERM while it's
in that function, control goes out through elog(FATAL) which doesn't
execute the PG_CATCH cleanup. But it does do AbortTransaction which
calls AtEOXact_CatCache, and that is expecting that all the cache
refcounts have reached zero.
We could respond to this by using PG_ENSURE_ERROR_CLEANUP there instead
of plain PG_TRY. But I have an itchy feeling that there may be a lot
of places with similar issues. Should we be revisiting the basic way
that elog(FATAL) works, to make it less unlike elog(ERROR)?
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | amul sul | 2017-08-08 15:45:38 | Re: reload-through-the-top-parent switch the partition table |
Previous Message | Robert Haas | 2017-08-08 14:49:52 | Re: pl/perl extension fails on Windows |