Re: [HACKERS] FreeBSD problem under heavy load

From: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: pgsql-hackers(at)hub(dot)org
Subject: Re: [HACKERS] FreeBSD problem under heavy load
Date: 1999-12-10 05:56:58
Message-ID: 19991210145658J.t-ishii@sra.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp> writes:
> > I seem to run into a serious problem. With 6.5.x + FreeBSD 3.2, I get
> > a core under heavy load (16 or more concurrent users). FreeBSD 2.2.x
> > seems more stable but soon or later same thing happens. Examing a
> > core, I found it segfaulted in hash_search().
>
> I've been looking into this without much success. I cannot reproduce it
> here under HPUX --- I ran pgbench for several hours without seeing any
> problem. I also made another pass over the dynahash.c code looking for
> portability bugs, but didn't find anything that looked promising. (The
> code is ugly and fragile, but AFAICT it will work under existing usage
> patterns.) It's quite possible the problem is elsewhere and dynahash is
> just on the receiving end of a memory clobber ... but if so, we have
> very little to go on in guessing where to look.
>
> Can anyone else reproduce the problem? Does anything show up in the
> postmaster log at or just before the crash?
>
> regards, tom lane

I think I got it. in storage/lmgr/lock.c:WaitOnLock:

char old_status[64],
new_status[64];
:
:

strcpy(old_status, PS_STATUS);
strcpy(new_status, PS_STATUS);
strcat(new_status, " waiting");
PS_SET_STATUS(new_status);
:
:
PS_SET_STATUS(old_status);

The current status string is copied into old_status, then the pointer
to it is set to a gloable variable ps_status by PS_SET_STATUS
macro. Unfortunately old_status is on the stack, so once WaitOnLock
returns, ps_status would point to a garbage. In the subsequent call to
WaitOnLock,

strcpy(old_status, PS_STATUS);

will copy garbage string into old_status. So if the string is longer
than 63, the stack would be broken. Note that this would not happen on
Linux due to the difference of the definition of the macro. See
include/utils/ps_status.h for more details.

Also, I don't understand why:

strcpy(new_status, PS_STATUS);
strcat(new_status, " waiting");
PS_SET_STATUS(new_status);

is necessary. Just:

PS_SET_STATUS("waiting");

should be enough. After doing some tests on my FreeBSD and Linux box,
I will commit fixes to both current and 6.5 source tree.

> PS: pgbench's configure fails on HPUX, because HP's compiler doesn't
> like whitespace before #include. I modified configure.in like this:
>
> AC_TRY_LINK([#include <sys/time.h>
> #include <sys/resource.h>],
> [struct rlimit rlim;

Thanks. I will incorporate your fix.

BTW, I think pgbench is usefull to detect this kind of problems. Can I
put it into contrib or somewhere?
--
Tatsuo Ishii

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 1999-12-10 06:25:39 OK, what's this LDREL all about?
Previous Message Bruce Momjian 1999-12-10 05:44:55 6.6 release