From: | "David Esposito" <dvesposito(at)newnetco(dot)com> |
---|---|
To: | <pgsql-general(at)postgresql(dot)org> |
Subject: | Core dump on PG 7.1.3 |
Date: | 2002-04-01 20:45:56 |
Message-ID: | PEEDKNLDICKECFBNGNLLOEHMEOAA.dvesposito@newnetco.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hello,
I have pretty serious problem ...
~~~~~~~~~~~
Summary: On our production database, we are consistently getting core dumps
when trying to execute a particular sequence of queries.
~~~~~~~~~~~
Platform:
Redhat 7.1 (the kernel appears to be 2.4.2-2)
Postgres 7.1.3 (SELECT version() reports "PostgreSQL 7.1.3 on
i686-pc-linux-gnu, compiled by GCC 2.96")
~~~~~~~~~~~
Here is the actual query that causes the core dump ... It is basically
trying to update ~20k rows
UPDATE campaign_email
SET campaign_email_status = 2,
campaign_email_last_modified = message_queue.last_modification
FROM message_queue, message_queue_purge_lock
WHERE message_queue.message_id = message_queue_purge_lock.message_id
AND message_queue.campaign_email_id = campaign_email.campaign_email_id;
~~~~~~~~~~~
Relevant schema (if you want the entire schema, i can send that along too
... we have FK constraints setup)
Table "message_queue"
Attribute | Type | Modifier
-------------------------+--------------------------+----------------------
message_id | integer | not null
status | character(1) | not null default 'N'
lock_holder_session_id | character varying(16) |
campaign_id | integer | not null
campaign_email_id | integer | not null
Index: message_queue_pkey
Table "message_queue_purge_lock"
Attribute | Type | Modifier
------------+---------+----------
message_id | integer | not null
Index: message_queue_purge_lock_pkey
Table "campaign_email"
Attribute | Type | Modifier
------------------------------+--------------------------+----------
campaign_email_id | numeric(11,0) | not null
campaign_id | numeric(11,0) | not null
campaign_email_status | numeric(2,0) | not null
campaign_email_last_modified | timestamp with time zone | not null
campaign_email_detail | character varying(500) |
impression_date | timestamp with time zone |
Indices: campaign_email_bounce_date_idx,
campaign_email_optout_date_idx,
campaign_email_pkey,
email_campaign_idx,
email_campaign_patron_unq,
email_patron_idx
~~~~~~~~~~~
The Postgres log file reports the following:
Server process (pid 24542) exited with status 139 at Mon Apr 1 11:40:20
2002
Terminating any active server processes...
NOTICE: Message from PostgreSQL backend:
The Postmaster has informed me that some other backend died abnormally and
possibly corrupted shared memory.
I have rolled back the current transaction and am going to terminate your
database system connection and exit.
Please reconnect to the database system and repeat your query.
The Data Base System is in recovery mode
Server processes were terminated at Mon Apr 1 11:40:20 2002
Reinitializing shared memory and semaphores
DEBUG: database system was interrupted at 2002-04-01 11:40:12 EST
DEBUG: CheckPoint record at (7, 3892464228)
DEBUG: Redo record at (7, 3892371300); Undo record at (0, 0); Shutdown
FALSE
DEBUG: NextTransactionId: 14376178; NextOid: 7673608
DEBUG: database system was not properly shut down; automatic recovery in
progress...
DEBUG: redo starts at (7, 3892371300)
The Data Base System is starting up
DEBUG: ReadRecord: record with zero len at (7, 3939513968)
DEBUG: redo done at (7, 3939513900)
The Data Base System is starting up
The Data Base System is starting up
The Data Base System is starting up
DEBUG: MoveOfflineLogs: remove 00000007000000E7
The Data Base System is starting up
DEBUG: database system is in production state
~~~~~~~~~~~
I didn't compile Postgres with debug symbols turned on so the backtrace from
the core dump is pretty useless looking
ore was generated by `postgres: postgres bnmail [local] UPDATE '.
Program terminated with signal 11, Segmentation fault.
#0 0x080b995f in ?? ()
(gdb) bt
#0 0x080b995f in ?? ()
#1 0x080ba4e5 in ?? ()
#2 0x080b9f37 in ?? ()
#3 0x080b9fb0 in ?? ()
#4 0x080ba1a1 in ?? ()
#5 0x080ba561 in ?? ()
#6 0x080b9f37 in ?? ()
#7 0x080b9fb0 in ?? ()
#8 0x080ba14d in ?? ()
#9 0x080ba555 in ?? ()
#10 0x080ba666 in ?? ()
#11 0x080bdfbd in ?? ()
#12 0x080bab13 in ?? ()
#13 0x080be194 in ?? ()
#14 0x080b928d in ?? ()
#15 0x080bfa08 in ?? ()
#16 0x080b92c9 in ?? ()
#17 0x080b8f16 in ?? ()
#18 0x080b8eee in ?? ()
#19 0x080b8847 in ?? ()
#20 0x080b84e1 in ?? ()
#21 0x080b79a8 in ?? ()
#22 0x080fc8d7 in ?? ()
#23 0x080fb3e6 in ?? ()
#24 0x080fc3ce in ?? ()
#25 0x080e7430 in ?? ()
#26 0x080e7023 in ?? ()
#27 0x080e628d in ?? ()
#28 0x080e5cd8 in ?? ()
#29 0x080c743f in ?? ()
#30 0x400ed306 in ?? ()
~~~~~~~~~~~
Here's the ultimate irony ... I cannot reproduce this bug on our
non-production database ... that means that I am unable to do the standard
experimentation that would further help diagnose this problem ... Reducing
the size of the data set, reducing the complexity of the tables, removing
the FK constraints ... all of those things are not an option on production
data ... However, since this problem needs to be fixed, I see myself being
able to do one of two things ...
- Recompiling PG 7.1.3 using the debug symbols ..
- Upgrading to PG 7.2
Is there anything I'm missing here that I could try that is non-invasive?
Thanks in advance
-Dave
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2002-04-01 21:16:50 | Re: Core dump on PG 7.1.3 |
Previous Message | David Esposito | 2002-04-01 20:43:15 | Core dump on PG 7.1.3 |