| From: | Eric Rousse <eric(dot)rousse(at)telmatik(dot)com> | 
|---|---|
| To: | pgsql-general(at)postgresql(dot)org | 
| Subject: | Strange Postgresql crash | 
| Date: | 2006-11-16 17:51:32 | 
| Message-ID: | 455CA524.40108@telmatik.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-general | 
Hello all,
I've been experiencing strange crash, never really took care of it since 
it was happening only every 1-2 months or so. But lately, I've seen it a 
lot in the past week and I have no clue about it, other than the backups.
So, here's some info about it and about my machine:
When: it crashes at night, at around 4AM, during the backup:
00 3 * * * root /export/dbsystem/pg_backup.sh va > /dev/null 2>&1
00 4 * * * root /export/dbsystem/pg_backup.sh b > /dev/null 2>&1
I move the vacuum to another time, just to make sure they are not in 
conflict, who knows!
Which version: 7.3.16, I used the tar.gz version from the website.
Normally, in a crash the machine just hangs on a kernel panic. The 
person on site always reboot the machine before taking a look at it. But 
I never had any crash during the day or almost, maybe once, and the 
kernel panic was talking about APCI. But during the night I'm not sure 
if it's the same thing, I think I'll just disable the APCI from the 
kernel and see if its okay.
Anyway, here's a quick log around 4AM, it doesn't say much...
10.1.1.54, is our monitoring machine, it only to the port using telnet.
"2006-11-16 03:55:39 [8681]   LOG:  pq_recvbuf: unexpected EOF on client 
connection
2006-11-16 03:55:39 [8681]   LOG:  incomplete startup packet
2006-11-16 03:56:39 [8682]   LOG:  connection received: host=10.1.1.54 
port=4754
2006-11-16 03:56:39 [8682]   LOG:  pq_recvbuf: unexpected EOF on client 
connection
2006-11-16 03:56:39 [8682]   LOG:  incomplete startup packet
2006-11-16 03:57:39 [8684]   LOG:  connection received: host=10.1.1.54 
port=4775
2006-11-16 03:57:39 [8684]   LOG:  pq_recvbuf: unexpected EOF on client 
connection
2006-11-16 03:57:39 [8684]   LOG:  incomplete startup packet
2006-11-16 03:58:39 [8685]   LOG:  connection received: host=10.1.1.54 
port=4828
2006-11-16 03:58:39 [8685]   LOG:  pq_recvbuf: unexpected EOF on client 
connection
2006-11-16 03:58:39 [8685]   LOG:  incomplete startup packet
2006-11-16 03:59:24 [8132]   ERROR:  parser: parse error at or near 
"WHEREligneid" at character 72
2006-11-16 03:59:24 [8132]   LOG:  statement: Update Appels Set 
controller=4506500413, agentassignedligne='1012261'  WHEREligneid=4506500420
2006-11-16 03:59:49 [8686]   LOG:  connection received: host=10.1.1.54 
port=4872
2006-11-16 03:59:50 [8686]   LOG:  pq_recvbuf: unexpected EOF on client 
connection
2006-11-16 03:59:50 [8686]   LOG:  incomplete startup packet
2006-11-16 04:00:02 [8702]   LOG:  connection received: host=10.1.1.45 
port=50457
2006-11-16 04:00:02 [8702]   LOG:  connection authorized: user=postgres 
database=template1
2006-11-16 04:00:02 [8726]   LOG:  connection received: host=10.1.1.45 
port=50458
2006-11-16 04:00:02 [8726]   LOG:  connection authorized: user=postgres 
database=martin_test
2006-11-16 04:00:29 [8744]   LOG:  connection received: host=10.1.1.45 
port=50459
2006-11-16 04:00:29 [8744]   LOG:  connection authorized: user=postgres 
database=test
2006-11-16 04:00:29 [8762]   LOG:  connection received: host=10.1.1.45 
port=50460
2006-11-16 04:00:29 [8762]   LOG:  connection authorized: user=postgres 
database=wincentrex
2006-11-16 04:00:39 [8763]   LOG:  connection received: host=10.1.1.54 
port=4894
2006-11-16 04:00:40 [8763]   LOG:  pq_recvbuf: unexpected EOF on client 
connection
2006-11-16 04:00:40 [8763]   LOG:  incomplete startup packet
2006-11-16 04:02:26 [2534]   LOG:  database system was interrupted at 
2006-11-16 03:57:36 EST
2006-11-16 04:02:26 [2534]   LOG:  checkpoint record is at C/6733EB68
2006-11-16 04:02:26 [2534]   LOG:  redo record is at C/6733EB68; undo 
record is at 0/0; shutdown FALSE
2006-11-16 04:02:26 [2534]   LOG:  next transaction id: 2720349894; next 
oid: 14377807
2006-11-16 04:02:26 [2534]   LOG:  database system was not properly shut 
down; automatic recovery in progress
2006-11-16 04:02:26 [2534]   LOG:  redo starts at C/6733EBA8
2006-11-16 04:02:27 [2534]   LOG:  ReadRecord: record with zero length 
at C/6735AB44
2006-11-16 04:02:27 [2534]   LOG:  redo done at C/6735AB20
2006-11-16 04:02:30 [2534]   LOG:  database system is ready"
Here's our active settings in postgresql.conf:
tcpip_socket = true
max_connections = 64
port = 5432
hostname_lookup = false
shared_buffers = 1520   # min max_connections*2 or 16, 8KB each
#shared_buffers = 12288  # min max_connections*2 or 16, 8KB each
max_fsm_relations = 1000        # min 10, fsm is free space map, ~40 bytes
max_fsm_pages = 10000           # min 1000, fsm is free space map, ~6 bytes
max_locks_per_transaction = 64  # min 10
wal_buffers = 8         # min 4, typically 8KB each
sort_mem = 32168                # min 64, size in KB
fsync = false
enable_seqscan = true
enable_indexscan = true
enable_tidscan = true
enable_sort = true
enable_nestloop = true
enable_mergejoin = true
enable_hashjoin = true
effective_cache_size =8000      # typically 8KB each
random_page_cost = 4            # units are one sequential page fetch cost
cpu_tuple_cost = 0.01           # (same)
cpu_index_tuple_cost = 0.001    # (same)
cpu_operator_cost = 0.0025      # (same)
log_connections = false
log_pid = true
log_statement = false
log_duration = false
log_timestamp = true
log_min_error_statement = notice # Values in order of increasing severity:
                                 #   debug5, debug4, debug3, debug2, debug1,
                                 #   info, notice, warning, error, 
panic(off)
syslog = 0                      # range 0-2
syslog_facility = 'LOCAL0'
syslog_ident = 'postgres'
LC_MESSAGES = 'en_US'
LC_MONETARY = 'en_US'
LC_NUMERIC = 'en_US'
LC_TIME = 'en_US'
I tested my memory with memtest, and it's perfect. I also did some 
stress test within Linux, using stress and donnie++ to see if it would 
crash with APCI or not, while doing a dump... So far its okay.
The machine: Linux aquilonII 2.6.17-1.2142_FC4 #1 Tue Jul 11 22:41:14 
EDT 2006 i686 i686 i386 GNU/Linux
Any one has a suggestion ?
-- 
Eric Rousse
514-655-1001
Telmatik inc.
204 Montarville, suite 250
Boucherville, QC, Canada
J4B 6S2
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Ardian Xharra | 2006-11-16 18:13:36 | Re: Why the data changes it's value by itself! | 
| Previous Message | Richard Huxton | 2006-11-16 17:30:02 | Re: Why the data changes it's value by itself! |