Re: [HACKERS] New regression driver

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: wieck(at)debis(dot)com (Jan Wieck), pgsql-hackers(at)postgreSQL(dot)org (PostgreSQL HACKERS)
Subject: Re: [HACKERS] New regression driver
Date: 1999-11-21 00:10:09
Message-ID: 7165.943143009@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> wieck(at)debis(dot)com (Jan Wieck) writes:
>> It is in utils/cache/catcache.c line 996. The comments say
>> that the code should prevent the backend from entering
>> infinite recursion while loading new cache entries.

> I will look at this. I don't think that the catcaches live in
> shared memory, so the problem is probably not what you suggest.
> The fact that the behavior is different under load may point to a
> real problem, not just an insufficiently clever debugging check.

Indeed, this is a real bug, and commenting out the code that caught
it is not the right fix!

What is happening is that utils/inval.c is trying to initialize some
variables that contain OIDs of system relations. This means calling
the catcache routines in order to look up relation names in pg_class.
However, if a shared cache inval message arrives from another backend
while that's happening, we recursively invoke inval.c to deal with the
message. And inval.c sees that its OID variables aren't initialized
yet, so it recursively calls the catcache routines to try to get them
initialized. Or, if just the first one's been initialized so far,
ValidateHacks() assumes they're all valid, and you can end up at the
elog(FATAL) panic at the bottom of CacheIdInvalidate(). I've got a core
dump which contains a ten-deep recursion between inval.c and syscache.c,
culminating in elog(FATAL) because the eleventh incoming sinval message
was just slow enough to let inval.c's first OID variable get filled in
before it arrived.

In short: we don't deal very robustly with cache invals happening
during backend startup. Send invals at a new backend with just the
right timing, and it'll choke.

I am not sure if this bug is of long standing or if we introduced it
since 6.5. It's possible I created it while messing with the relcache
stuff a month or two ago. But I can easily believe that it's been
there a long time and we never had a way of reproducing the problem
with any reliability before.

I think the fix is to rip out inval.c's attempt to look up system
relation names, and just give it hardwired knowledge of their OIDs.
Even though it sort-of works to do the lookups, it's bad practice for
routines that are potentially called during catcache initialization
to depend on the catcache to be already working. And there are other
places that already have hardwired knowledge of the system relation
OIDs, so...

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 1999-11-21 00:55:03 Re: [HACKERS] Getting OID in psql of recent insert
Previous Message Timothy 1999-11-20 22:01:04