From: | "Hiroshi Inoue" <Inoue(at)tpf(dot)co(dot)jp> |
---|---|
To: | "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | <pgsql-hackers(at)postgreSQL(dot)org> |
Subject: | RE: [HACKERS] Sure enough, SI buffer overrun is broken |
Date: | 2000-01-30 03:14:55 |
Message-ID: | NDBBIJLOILGIKBGDINDFKEIECCAA.Inoue@tpf.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> -----Original Message-----
> From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
>
> "Hiroshi Inoue" <Inoue(at)tpf(dot)co(dot)jp> writes:
> >> I built the current sources with MAXNUMMESSAGES set to 32 in
> >> src/include/storage/sinvaladt.h. The regular regress tests
> >> run OK, with just a few NOTICEs about 'cache state reset'
> >> and 'SI buffer overflow' inserted in the normal outputs
> >> (as you'd expect, if SI overrun occurs).
> >>
> >> However, the parallel tests crash spectacularly, with weird errors
> >> and Assert() coredumps.
>
> With these two changes in place, the parallel regress tests seem much
> more stable. There is still a big problem though, which is that the
> "portals" regress test sometimes fails. I traced this far enough to
> discover that the code is trying to use a TupleDesc that it's stored in
> the ScanTupleSlot of a plan, and this TupleDesc is no longer valid ---
> presumably the relcache entry it was gotten from has been flushed and
> rebuilt, leaving the plan with a dangling TupleDesc pointer. Ugh.
>
Oh great,I've also doubted relcache entry rebuild but wasn't able to
trace so far.
> I do not think it is very practical to try to change all of the places
> that assume that they can save pointers to the TupleDesc of a relcache
> entry. Instead I am thinking about solving the problem inside the
> relcache, as follows:
>
> During a relcache entry rebuild, do not simply free and reconstruct
> the TupleDesc. Instead, read the catalogs to build a new TupleDesc,
> and compare this one to the old one. If they are the same, free the
> new one instead (leaving the old one in place, and hence stored pointers
> to it are still valid). If they are not the same, then elog(ERROR).
>
> (elog may sound overly paranoid, but this condition indicates that the
> table's definition has actually changed since we grabbed the refcount on
> it --- remember we wouldn't be doing this at all if the relcache entry had
> zero refcount --- and therefore all kinds of derived information such as
> plans may be wrong. Pressing ahead will probably lead to crash.)
>
Sounds reasonable.
> We may need to do the same for any other substructures of the relcache
> entry that are visible from outside relcache.c.
>
> I know this sounds pretty grotty, but we are already doing it for the
> relcache entry itself --- rebuild re-uses the same physical entry
> rather than deleting and reallocating it, to ensure that pointers
> to an open Relation stay valid over an SI flush. We need to extend
> the same guarantee to the substructure of the relcache entry.
>
> Unless someone has a better idea, I'll work on that.
>
Agreed.
Regards.
Hiroshi Inoue
Inoue(at)tpf(dot)co(dot)jp
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2000-01-30 03:18:09 | Re: [HACKERS] freefuncs.c is never called from anywhere!? |
Previous Message | Hiroshi Inoue | 2000-01-30 03:14:53 | RE: ImmediateSharedRelationCacheInvalidate considered harmful |