Re: Transaction-Overflow

From: pingu(dot)freak(at)web(dot)de
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: Transaction-Overflow
Date: 2007-08-08 09:38:11
Message-ID: 2041191463@web.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Hi,

first thanks for your answers.

Now I found some ECC-exceptions in the Kernel.:

EDAC MC0: CE page 0xfeb8e, offset 0x0, grain 4096, syndrome 0xb32, row 4, channel 0, label "": e7xxx CE
EDAC MC0: CE - no information available: e7xxx CE log register overflow
EDAC MC0: CE page 0xfeb8e, offset 0x0, grain 4096, syndrome 0xb32, row 4, channel 0, label "": e7xxx CE
EDAC MC0: CE - no information available: e7xxx CE log register overflow
EDAC MC0: CE page 0xfeb8e, offset 0x0, grain 4096, syndrome 0xb32, row 4, channel 0, label "": e7xxx CE
EDAC MC0: CE - no information available: e7xxx CE log register overflow
EDAC MC0: CE page 0xfeb8e, offset 0x0, grain 4096, syndrome 0xb32, row 4, channel 0, label "": e7xxx CE
EDAC MC0: CE - no information available: e7xxx CE log register overflow
EDAC MC0: CE page 0xfeb8e, offset 0x0, grain 4096, syndrome 0xb32, row 4, channel 0, label "": e7xxx CE
EDAC MC0: CE - no information available: e7xxx CE log register overflow

This is on both servers, production and backup. Right know, I'm updating the Kernel
to 2.6.22.1. Hopefully this helps :/. But I think there is no hope.

There are also Traces in dmesg:

Code: f3 a5 89 c1 f3 a4 eb 21 89 c8 83 f9 07 76 18 89 f9 f7 d9 83 e1 07 29 c8 f3 a4 89 c1 c1 e9 02 83 e0 03 90 f3 a5 89 c1 f3 a4 5e 89 <c8> 5f c3 57 85 c9 56 89 c7 89
d6 79 08 0f 0b 0a 03 71 ce 2c c0
EIP: [<c01c3a2c>] __copy_from_user_ll_nozero+0xd7/0xda SS:ESP 0068:dca2fd94
<4>EDAC MC0: CE page 0xb4124, offset 0x0, grain 4096, syndrome 0xfc1, row 4, channel 0, label "": e7xxx CE
EDAC MC0: CE page 0xb4124, offset 0x0, grain 4096, syndrome 0xfc1, row 4, channel 0, label "": e7xxx CE
EDAC MC0: CE page 0xb4124, offset 0x0, grain 4096, syndrome 0xfc1, row 4, channel 0, label "": e7xxx CE
EDAC MC0: CE page 0xb4124, offset 0x0, grain 4096, syndrome 0xfc1, row 4, channel 0, label "": e7xxx CE
BUG: unable to handle kernel NULL pointer dereference at virtual address 00000002
printing eip:
c01c3a2c
*pde = 2f39e001
Oops: 0000 [#3]
SMP
last sysfs file: /devices/pci0000:00/0000:00:00.0/class
Modules linked in: nfs lockd nfs_acl sunrpc iptable_filter ip_tables x_tables lp parport_pc parport af_packet joydev st sr_mod ipv6 button battery ac apparmor aamatch
_pcre loop dm_mod e1000 ide_cd cdrom i2c_i801 e7xxx_edac edac_mc i2c_core ext3 mbcache jbd edd fan sg gdth aic79xx scsi_transport_spi piix thermal processor sd_mod sc
si_mod ide_disk ide_core
CPU: 0
EIP: 0060:[<c01c3a2c>] Tainted: G U VLI
EFLAGS: 00010206 (2.6.18.2-34-bigsmp #1)
EIP is at __copy_from_user_ll_nozero+0xd7/0xda
eax: e5f17dbc ebx: 00000001 ecx: 00000006 edx: bff0ef9a
esi: 00000000 edi: 01de802f ebp: 00000006 esp: e5f17d94
ds: 007b es: 007b ss: 0068
Process postmaster (pid: 13414, ti=e5f16000 task=e93710b0 task.ti=e5f16000)
Stack: c01a81f2 00000000 00003466 0000000e e5f17dbc 00000000 00000000 00000001
00000002 00000000 ffff0002 c0100000 00000000 b6b86840 b6b85000 c1d9df20
00001000 d4f6ae9c 21741707 46b611a0 c0125770 3b9aca00 00000163 80000000
Call Trace:
[<c01a81f2>] exit_sem+0x58/0x14c
[<c0125770>] current_fs_time+0x4f/0x5b
[<c014ca56>] get_page_from_freelist+0x2f1/0x371
[<c01487f7>] find_lock_page+0x1a/0x77
[<c015f3b5>] shmem_getpage+0x4f2/0x552
[<c0160375>] shmem_nopage+0xa4/0xb6
[<c0154076>] __handle_mm_fault+0x63e/0xb9c
[<c01325aa>] autoremove_wake_function+0x0/0x35
[<c0108567>] sys_ipc+0x5e/0x1bb
[<c0103ddd>] sysenter_past_esp+0x56/0x79
Code: f3 a5 89 c1 f3 a4 eb 21 89 c8 83 f9 07 76 18 89 f9 f7 d9 83 e1 07 29 c8 f3 a4 89 c1 c1 e9 02 83 e0 03 90 f3 a5 89 c1 f3 a4 5e 89 <c8> 5f c3 57 85 c9 56 89 c7 89
d6 79 08 0f 0b 0a 03 71 ce 2c c0
EIP: [<c01c3a2c>] __copy_from_user_ll_nozero+0xd7/0xda SS:ESP 0068:e5f17d94
<6>device eth0 left promiscuous mode

The hardware is 5 years old... It was not possible to get new hardware
for this project. :/

Regards,

Martin

-----Ursprüngliche Nachricht-----
Von: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Gesendet: 07.08.07 20:41:08
An: pingu(dot)freak(at)web(dot)de
CC: pgsql-admin(at)postgresql(dot)org
Betreff: Re: [ADMIN] Transaction-Overflow

pingu(dot)freak(at)web(dot)de writes:
> On the top in the log file is this, do you know why the pid is killed with =
> 11? I'm a little bit confused :(.

> LOG: Serverprozess (PID 30399) wurde von Signal 11 beendet

SIG 11 (ie SIGSEGV) is pretty much the typical "generic crash"
indication. It most likely means you ran into a software bug or
corrupted data. There is no reason at all to think that it's got
anything to do with transaction ID wraparound --- that message is
only coming out because it always comes out at a database restart.

What you ought to look into is what *did* cause the crash. Did it
produce a core file, and if so can you get a gdb stack trace from
the core?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

_______________________________________________________________________
Jetzt neu! Schützen Sie Ihren PC mit McAfee und WEB.DE. 3 Monate
kostenlos testen. http://www.pc-sicherheit.web.de/startseite/?mc=022220

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Kevin Grittner 2007-08-08 13:44:42 Re: PITR backup to Novell Netware file server
Previous Message Michael Fuhr 2007-08-08 07:57:37 Re: Help with High value unicode characters