Re: Postgres 8.4.20 seqfault on RHEL 6.4

From: Dave Johansen <davejohansen(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: Postgres 8.4.20 seqfault on RHEL 6.4
Date: 2015-02-13 22:47:13
Message-ID: CAAcYxUfKNmEnQgEVc_9+OMr2ESx_DERNusxGxiL_X0TQzACJug@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

On Fri, Feb 13, 2015 at 2:38 PM, Dave Johansen <davejohansen(at)gmail(dot)com>
wrote:

> On Thu, Feb 12, 2015 at 4:33 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
>> Dave Johansen <davejohansen(at)gmail(dot)com> writes:
>> > I'm running Postgres 8.4.20 on RHEL 6.4 and it will occasionally crash.
>> The
>> > postgres.log file just says that a PID was terminated. The output from
>> > dmesg has a message like this one:
>> > postmaster[22905]: segfault at 686 ip 0000000000000686 sp
>> 00007fff83d72e88
>> > error 14 in postgres[400000+463000]
>>
>> > What can I do to try and figure out what is causing the crash and fix
>> it?
>>
>> (1) install relevant postgresql-debuginfo package (assuming we're talking
>> about a Red Hat-originated postgres package)
>>
>> (2) run postmaster under "ulimit -c unlimited" (easiest way is probably
>> to add such a command to /etc/rc.d/init.d/postgresql and restart the
>> service)
>>
>> (3) wait for crash
>>
>> (4) gdb the resulting corefile (should be under your $PGDATA directory)
>>
>> (5) send in a stack trace.
>>
>
> Here's the stacktrace from gdb (if it matters, the package version from
> RHEL is postgresql-8.4.18-1.el6_4.x86_64):
> #0 0x0000000000000686 in ?? ()
> #1 0x00007f76ae551801 in ?? ()
> #2 0x00000000019f7793 in ?? ()
> #3 0x00007fff06ad6be0 in ?? ()
> #4 0x00007fff06ad6be0 in ?? ()
> #5 0x0000000000545e35 in ExecMakeFunctionResult (fcache=0x19f5680,
> econtext=0x19f37e8, isNull=0x19f7793 "", isDone=0x19f7b8c) at
> execQual.c:1870
> #6 0x0000000000541096 in ExecTargetList (projInfo=<value optimized out>,
> isDone=0x7fff06ad704c) as execQual.c:5212
> #7 ExecProject (projeInfo=<value optimized out>, isDone=0xfff06ad704c) as
> execQual.c:5427
> #8 0x0000000000553c5b in ExecResult (node=0x1999a68) at nodeResult.c:155
> #9 0x00000000005406c8 in ExecProcNode (node=0x1999a68) at
> execProcnode.c:344
> #10 0x000000000053e942 in ExecutePlan (queryDesc=0x1990c60,
> direction=<value optimized out>, count=0) as execMain.c:1542
> #11 0xstandardExecutorRun (queryDesc=0x1990c60, direction=<value optimized
> out>, count=0) as execMain.c:310
> ... (I can include the rest, if it's needed)
>
> Any insight?
> Thanks,
> Dave
>

So from looking at the stacktrace it looked like the issue was happening in
one of our C functions. I did some digging and what had happened was the
permissions on the folder that has those functions had been set wide open,
so whenever someone built our software it overwrote the .so files.
Normally, it's a process that's only done by the postgres when a new
"version" is rolled out, but that check was being overwritten because of
the incorrect permissions.

So that brings up a different question that I will start a new thread for.

Thanks for the help,
Dave

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Dave Johansen 2015-02-13 22:58:17 Updating .so files for functions?
Previous Message Dave Johansen 2015-02-13 21:38:58 Re: Postgres 8.4.20 seqfault on RHEL 6.4