Re: "show all" command crashes server *** FIXED ***

From: Grant Maxwell <grant(dot)maxwell(at)maxan(dot)com(dot)au>
To: Grant Maxwell <grant(dot)maxwell(at)maxan(dot)com(dot)au>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Richard Huxton <dev(at)archonet(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: "show all" command crashes server *** FIXED ***
Date: 2009-09-13 23:32:30
Message-ID: FC5AE9BA-7F94-4164-9B91-ABE13FDA1B94@maxan.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

First of all thanks to those who provided input.

This problem is now fixed and I thought I would post this solution so
that others might benefit in the future.

For the sake of completeness:

The error was that if "show all" was run on this postgresql (version
8.3) server, postgres would crash and then recover.
Otherwise the server "seemed" healthy

The postgres log showed:
Sep 10 23:55:36 theconsole postgres[31118]: [4-1] 0: LOG: 00000:
server process (PID 31145) was terminated by signal 11: Segmentation
fault
Sep 10 23:55:36 theconsole postgres[31118]: [4-2] 0: LOCATION:
LogChildExit, postmaster.c:2529
Sep 10 23:55:36 theconsole postgres[31118]: [5-1] 0: LOG: 00000:
terminating any other active server processes
Sep 10 23:55:36 theconsole postgres[31118]: [5-2] 0: LOCATION:
HandleChildCrash, postmaster.c:2374
Sep 10 23:55:36 theconsole postgres[31118]: [6-1] 0: LOG: 00000:
all server processes terminated; reinitializing
Sep 10 23:55:36 theconsole postgres[31118]: [6-2] 0: LOCATION:
PostmasterStateMachine, postmaster.c:2690
Sep 10 23:55:36 theconsole postgres[31146]: [7-1] 0: LOG: 00000:
database system was interrupted; last known up at 2009-09-10 23:55:14
EST
Sep 10 23:55:36 theconsole postgres[31146]: [7-2] 0: LOCATION:
StartupXLOG, xlog.c:4836
Sep 10 23:55:36 theconsole postgres[31147]: [7-1] [local] postgres
postgres 0: FATAL: 57P03: the database system is in recovery mode
Sep 10 23:55:36 theconsole postgres[31147]: [7-2] [local] postgres
postgres 0: LOCATION: ProcessStartupPacket, postmaster.c:1648
Sep 10 23:55:36 theconsole postgres[31146]: [8-1] 0: LOG: 00000:
database system was not properly shut down; automatic recovery in
progress
Sep 10 23:55:36 theconsole postgres[31146]: [8-2] 0: LOCATION:
StartupXLOG, xlog.c:5003
Sep 10 23:55:36 theconsole postgres[31146]: [9-1] 0: LOG: 00000:
record with zero length at 2A/E734761C
Sep 10 23:55:36 theconsole postgres[31146]: [9-2] 0: LOCATION:
ReadRecord, xlog.c:3126
Sep 10 23:55:36 theconsole postgres[31146]: [10-1] 0: LOG: 00000:
redo is not required
Sep 10 23:55:36 theconsole postgres[31146]: [10-2] 0: LOCATION:
StartupXLOG, xlog.c:5146
Sep 10 23:55:36 theconsole postgres[31150]: [7-1] 0: LOG: 00000:
autovacuum launcher started
Sep 10 23:55:36 theconsole postgres[31150]: [7-2] 0: LOCATION:
AutoVacLauncherMain, autovacuum.c:520
Sep 10 23:55:36 theconsole postgres[31118]: [7-1] 0: LOG: 00000:
database system is ready to accept connections

SOLUTION:
Increase the memory on the server.

WHY
We had recently ( a month before) had installed splunk on the server.
It was running ok
The combination of splunk and other tasks running had pushed the
memory too close.
What we did not notice was that swap had been almost completely
consumed - nasty

RESULT
We shut it all down, increased the memory (double) and voila -
problem gone.

It goes to show that when hunting problems we should not ignore the
basic environmental elements.
It also goes to show that our monitoring system was not looking at
this relatively new server.
(this confession is not an invitation for a spanking)

again thanks for the help
Grant

On 11/09/2009, at 9:09 AM, Grant Maxwell wrote:

>
> On 11/09/2009, at 8:36 AM, Tom Lane wrote:
>
>> Grant Maxwell <grant(dot)maxwell(at)maxan(dot)com(dot)au> writes:
>>> On the problem server:
>>> shared_preload_libraries = 'pgmemcache'
>>> #local_preload_libraries = ''
>>
>>> on the others both are emply.
>>
>> Sounds like a smoking gun to me.
>>
>>> For good measure I removed pgmemcache but the problem persists.
>>
>> Did you restart the postmaster afterwards? shared_preload_libraries
>> is only considered at postmaster start.
>
> yep - full restart.
>>
>> regards, tom lane
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2009-09-14 00:24:30 Re: "show all" command crashes server *** FIXED ***
Previous Message Daniel Schuchardt 2009-09-13 22:38:01 Re: invalid byte sequence for encoding