From: | "Philip Poles" <philip(at)surfen(dot)com> |
---|---|
To: | "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Need help with error |
Date: | 2000-07-05 20:45:39 |
Message-ID: | 025801bfe6c1$fac0a8f0$26ab6bcf@surfen.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Greetings...
I'm not sure if this is relevant, but I've seen similar errors occur when there
are too many open files on the filesystem (running Linux RH 6.2). I'm not sure
if this problem is in the backend or the Linux kernal, or somewhere else, not
being very conversant in such matters myself, but I did have our admin increase
the limit for number of open files.
As far as I recall, when this happens, the postmaster tries to reset all
currently running backends. I don't think I've seen it dump core, but I can
reproduce the situation fairly easily (by running a hundred or so concurrent 7
table join queries) to find out...I'll try it on Friday, if I get a chance.
Steven, I have no knowledge of how BSDI behaves, but might this have something
to do with your problem?
It seems to me as though postgres winds up with a LOT of open files when
processing complex queries - is this actually the case, or should I be looking
elsewhere for the cause of this problem?
-Philip
P.S. Tom - I haven't actually been able to reproduce that problem I was having
with hash indicies...it just went away...and I know nothing has changed, except
maybe the load on the server...I'll keep trying, maybe I can get a bug report in
about it after all.
----- Original Message -----
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Steven Saner <ssaner(at)pantheranet(dot)com>
Cc: <pgsql-general(at)postgresql(dot)org>
Sent: Wednesday, July 05, 2000 4:29 PM
Subject: Re: [GENERAL] Need help with error
Steven Saner <ssaner(at)pantheranet(dot)com> writes:
> Using Postgres 7.0 on BSDI 4.1
> For the last several days we are getting errors that look like this:
> Error: cannot write block 0 of krftmp4 [adm] blind.
> An interesting thing is that in this example, krftmp4 is a table that
> the user that got this error message would not have accessed in any
> way.
Right --- that's implicit in the blind-write logic. A blind write
means trying to dump out a dirty page from the shared buffer pool
that belongs to a relation your own backend hasn't touched.
Since the write fails, the dirty block remains in the shared buffer
pool, waiting for some other backend to try to dump it again and fail
again :-(
The simplest recovery method is to restart the postmaster, causing a new
buffer pool to be set up.
However, from a developer's perspective, I'm more interested in finding
out how you got into this state in the first place. We thought we'd
fixed all the bugs that could give rise to orphaned dirty blocks, which
was the cause of this type of error in all the cases we'd seen so far.
Perhaps there is still a remaining bug of that kind, or maybe you've
found a new way to cause this problem. Do you have time to do some
investigation before you restart the postmaster?
One thing I'd like to know is why the write is failing in the first
place. Have you deleted or renamed the krftmp4 table, or its containing
database adm, probably not too long before these errors started
appearing?
> When this happens, it seems that the backend dies, which
> ends up causing the backend connections for all users to die.
That shouldn't be happening either; blind write failure is classed as
a simple ERROR, not a FATAL error. Does any message appear in the
postmaster log? Is a corefile dumped, and if so what do you get from
a backtrace?
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Jan Wieck | 2000-07-05 20:54:43 | Re: proposed improvements to PostgreSQL license |
Previous Message | Jan Wieck | 2000-07-05 20:41:18 | Re: PostgreSQL 7.1 |