From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Jeremy Drake <pgsql(at)jdrake(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: catalog corruption bug |
Date: | 2006-01-07 20:08:37 |
Message-ID: | 480.1136664517@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Jeremy Drake <pgsql(at)jdrake(dot)com> writes:
> On Sat, 7 Jan 2006, Tom Lane wrote:
>> I'll go fix CatCacheRemoveCList, but I think this is not the bug
>> we're looking for.
> Incidentally, one of my processes did get that error at the same time.
> All of the other processes had an error
> DBD::Pg::st execute failed: server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> But this one had the DBD::Pg::st execute failed: ERROR: duplicate key
> violates unique constraint "pg_type_typname_nsp_index"
Oh, that's interesting ... maybe there is some relation after all?
Hard to see what ...
> It looks like my kernel did not have the option to append the pid to core
> files ,so perhaps they both croaked at the same time but only this one got
> to write a core file?
Yeah, they'd all be dumping into the same directory. It's reasonable to
suppose that the corefile you have is from the one that aborted last.
That would suggest that this is effect not cause ... hmmm ...
A bit of a leap in the dark, but: maybe the triggering event for this
situation is not a "VACUUM pg_amop" but a global cache reset due to
sinval message buffer overrun. It's fairly clear how that would lead
to the CatCacheRemoveCList bug. The duplicate-key failure could be an
unrelated bug triggered by the same condition. I have no idea yet what
the mechanism could be, but cache reset is a sufficiently seldom-exercised
code path that it's entirely plausible that there are bugs lurking in it.
If this is correct then we could vastly increase the probability of
seeing the bug by setting up something to force cache resets at a high
rate. If you're interested I could put together a code patch for that.
> BTW, nothing of any interest made it into the backend log regarding what
> assert(s) failed.
What you'd be looking for is a line starting "TRAP:".
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2006-01-07 22:03:23 | Test tool for sinval reset situations |
Previous Message | Qingqing Zhou | 2006-01-07 19:58:23 | Re: Warm-up cache may have its virtue |