From: | Doug McNaught <doug(at)wireboard(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Backends dying due to memory exhaustion--I'm stonkered |
Date: | 2001-01-27 00:35:09 |
Message-ID: | m3g0i5sugy.fsf@belphigor.mcnaught.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> Doug McNaught <doug(at)wireboard(dot)com> writes:
> > The problem I'm having is that the backends will crash randomly, after
> > the database has been up for a few days, with:
> > FATAL 1: Memory exhausted in AllocSetAlloc()
>
> > The system has plenty of memory and swap, and under normal
> > circumstances the backends take up 10-15 megabytes. If it's a
> > runaway situation of some kind, it happens very fast, as I've even
> > taken snapshots of the process table at 1 minute intervals, and they
> > show no abnormality right up to the time of the crash.
>
> Hmm. That puts a damper on the idea that it's a memory leak --- doesn't
> eliminate the theory entirely, however. The other likely theory is that
> you've got a variable-size column value someplace whose size word has
> been corrupted, so that it claims to be umpteen megabytes long. Any
> attempt to copy such a value out of the tuple it's in will result in
> an instant "out of memory" complaint.
Hmm, very interesting. Does VARCHAR count as a variable-size column?
One funny thing is that the nightly VACUUM doesn't always fail--the
system will run smoothly for one to three days on average before a
crash.
> Is there any consistency about which table is being touched when the
> failure occurs? It's not hard to isolate and delete a damaged tuple
> once you know which table it's in, but if you've got a lot of tables
> the initial search can be tedious.
I'll check into this. Having just looked over my error logs, I see
some suspects but nothing jumps out at me. Unfortunately, OpenACS has
a boatload of tables, and there are 8 different instances, each with
its own database.
> One way to get more info is to tweak the code to abort() just before
> it would normally report the out-of-memory error. Then you will get
> a coredump and can learn something from the backtrace (don't forget
> to compile with -g).
That's a thought, and I will try it. I'm currently (as of yesterday's
crash) running with -d 2 and output sent to a logfile. Is this
debuglevel high enough to tell me which table contains the bad tuple,
if that's indeed the problem?
If I can't nail it down that way, how hard would it be to write a C
program to scan all the tuples in a database looking for bogus size
fields?
-Doug
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2001-01-27 00:43:10 | Re: Backends dying due to memory exhaustion--I'm stonkered |
Previous Message | Peter Eisentraut | 2001-01-27 00:34:50 | Re: high level specs on PL ? |