Server crash (FailedAssertion) due to catcache refcount mis-handling

From: Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Server crash (FailedAssertion) due to catcache refcount mis-handling
Date: 2017-08-08 12:34:56
Message-ID: CAM2+6=VEE30YtRQCZX7_sCFsEpoUkFBV1gZazL70fqLn8rcvBA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

We have observed a random server crash (FailedAssertion), while running few
tests at our end. Stack-trace is attached.

By looking at the stack-trace, and as discussed it with my team members;
what we have observed that in SearchCatCacheList(), we are incrementing
refcount and then decrementing it at the end. However for some reason, if
we are in TRY() block (where we increment the refcount), and hit with any
interrupt, we failed to decrement the refcount due to which later we get
assertion failure.

To mimic the scenario, I have added a sleep in SearchCatCacheList() as
given below:

diff --git a/src/backend/utils/cache/catcache.c
b/src/backend/utils/cache/catcache.c
index e7e8e3b..eb6d4b5 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -1520,6 +1520,9 @@ SearchCatCacheList(CatCache *cache,
hashValue = CatalogCacheComputeTupleHashValue(cache, ntp);
hashIndex = HASH_INDEX(hashValue, cache->cc_nbuckets);

+ elog(INFO, "Sleeping for 0.1 seconds.");
+ pg_usleep(100000L); /* 0.1 seconds */
+
bucket = &cache->cc_bucket[hashIndex];
dlist_foreach(iter, bucket)
{

And then followed these steps to get a server crash:

-- Terminal 1
DROP TYPE typ;
DROP FUNCTION func(x int);

CREATE TYPE typ AS (X VARCHAR(50), Y INT);

CREATE OR REPLACE FUNCTION func(x int) RETURNS int AS $$
DECLARE
rec typ;
var2 numeric;
BEGIN
RAISE NOTICE 'Function Called.';
REC.X := 'Hello';
REC.Y := 0;

IF (rec.Y + var2) = 0 THEN
RAISE NOTICE 'Check Pass';
END IF;

RETURN 1;
END;
$$ LANGUAGE plpgsql;

SELECT pg_backend_pid();

SELECT func(1);

-- Terminal 2, should be run in parallel when SELECT func(1) is in progress
in terminal 1.
SELECT pg_terminate_backend(<pid of backend obtained in terminal 1>);

I thought it worth posting here to get others attention.

I have observed this on the master branch, but can also be reproducible on
back-branches.

Thanks
--
Jeevan Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Attachment Content-Type Size
stack-trace.txt text/plain 7.4 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sandeep Thakkar 2017-08-08 12:37:44 Re: pl/perl extension fails on Windows
Previous Message Alexander Korotkov 2017-08-08 12:25:39 Re: GSoC 2017: Foreign Key Arrays