Please help me debug regular segfaults on 8.3.10

From: pgsql <pgsql(at)lavabit(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Please help me debug regular segfaults on 8.3.10
Date: 2010-05-04 21:05:46
Message-ID: hrq276$1lu$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

one of our pgsql instances recently started to segfault multiple times a
week. I tried a couple of things to pin it down to a certain query
or job but failed to find any pattern. All I can offer is some notes
and a set of similar looking back traces.

Thanks in advance.

Machine details
---------------
* CentOS release 5.4 (Final)
* Linux 2.6.18-164.15.1.el5 #1 SMP Wed Mar 17 11:30:06 EDT 2010 x86_64
x86_64 x86_64 GNU/Linux
* 4x Quad-Core AMD Opteron 8354
* 64GB RAM (ECC)

PostgreSQL packages
-------------------
* postgresql-8.3.10-2PGDG.el5
* postgresql-contrib-8.3.10-2PGDG.el5
* postgresql-devel-8.3.10-2PGDG.el5
* postgresql-libs-8.3.10-2PGDG.el5
* postgresql-plperl-8.3.10-2PGDG.el5
* postgresql-plpython-8.3.10-2PGDG.el5
* postgresql-pltcl-8.3.10-2PGDG.el5
* postgresql-server-8.3.10-2PGDG.el5

Environment
-----------
* Multiple databases with a total of 1TB in size
* So far the back traces show three different databases
* Some larger hash indexes exist (requiring reindex after each crash)
* The only loaded PL is pl/pgsql
* The system is doing around 3000 TPS constantly

Things that didn't make any change
----------------------------------
* Updated from 8.3.7 to 8.3.10
* Updated OS kernel

2010-05-04 | core.21207
-----------------------
Core was generated by `postgres: <user> <database_1> <client ip>('.
Program terminated with signal 11, Segmentation fault.
[New process 21207]
#0 0x000000000066acae in pfree ()
(gdb) bt
#0 0x000000000066acae in pfree ()
#1 0x0000000000648c6e in ?? ()
#2 0x0000000000648f34 in ?? ()
#3 0x00000000006493d4 in RelationCacheInvalidateEntry ()
#4 0x0000000000644fcd in ?? ()
#5 0x0000000000644882 in ?? ()
#6 0x00000000006448be in CommandEndInvalidationMessages ()
#7 0x0000000000472993 in CommandCounterIncrement ()
#8 0x00000000005342ea in ?? ()
#9 0x0000000000534543 in SPI_execute_plan ()
#10 0x00002ad2863f0148 in ?? () from /usr/lib64/pgsql/plpgsql.so
#11 0x00002ad2863f1a26 in ?? () from /usr/lib64/pgsql/plpgsql.so
#12 0x00002ad2863f3372 in ?? () from /usr/lib64/pgsql/plpgsql.so
#13 0x00002ad2863f3ce5 in plpgsql_exec_function () from
/usr/lib64/pgsql/plpgsql.so
#14 0x00002ad2863ea7be in plpgsql_call_handler () from
/usr/lib64/pgsql/plpgsql.so
#15 0x000000000052023e in ExecMakeFunctionResult ()
#16 0x000000000051d1f3 in ExecProject ()
#17 0x000000000052df13 in ExecResult ()
#18 0x000000000051cc66 in ExecProcNode ()
#19 0x000000000051bedf in ExecutorRun ()
#20 0x00000000005b1481 in ?? ()
#21 0x00000000005b2689 in PortalRun ()
#22 0x00000000005ae3b0 in ?? ()
#23 0x00000000005af038 in PostgresMain ()
#24 0x00000000005856a7 in ?? ()
#25 0x000000000058632b in PostmasterMain ()
#26 0x000000000053eece in main ()

2010-04-29 | core.20832
-----------------------
Core was generated by `postgres: <user> <database_1> <client ip>('.
Program terminated with signal 11, Segmentation fault.
[New process 20832]
#0 0x000000000066acae in pfree ()
(gdb) bt
#0 0x000000000066acae in pfree ()
#1 0x0000000000648c6e in ?? ()
#2 0x0000000000648f34 in ?? ()
#3 0x00000000006493d4 in RelationCacheInvalidateEntry ()
#4 0x0000000000644fcd in ?? ()
#5 0x0000000000644882 in ?? ()
#6 0x00000000006448be in CommandEndInvalidationMessages ()
#7 0x0000000000472993 in CommandCounterIncrement ()
#8 0x00000000005342ea in ?? ()
#9 0x0000000000534543 in SPI_execute_plan ()
#10 0x00002b41879e1148 in ?? () from /usr/lib64/pgsql/plpgsql.so
#11 0x00002b41879e2a26 in ?? () from /usr/lib64/pgsql/plpgsql.so
#12 0x00002b41879e4372 in ?? () from /usr/lib64/pgsql/plpgsql.so
#13 0x00002b41879e4ce5 in plpgsql_exec_function () from
/usr/lib64/pgsql/plpgsql.so
#14 0x00002b41879db7be in plpgsql_call_handler () from
/usr/lib64/pgsql/plpgsql.so
#15 0x000000000052023e in ExecMakeFunctionResult ()
#16 0x000000000051d1f3 in ExecProject ()
#17 0x000000000052df13 in ExecResult ()
#18 0x000000000051cc66 in ExecProcNode ()
#19 0x000000000051bedf in ExecutorRun ()
#20 0x00000000005b1481 in ?? ()
#21 0x00000000005b2689 in PortalRun ()
#22 0x00000000005ae3b0 in ?? ()
#23 0x00000000005af038 in PostgresMain ()
#24 0x00000000005856a7 in ?? ()
#25 0x000000000058632b in PostmasterMain ()
#26 0x000000000053eece in main ()

2010-04-27 | core.25421
-----------------------
Core was generated by `postgres: <user> <database_1> <client ip>('.
Program terminated with signal 11, Segmentation fault.
[New process 25421]
#0 0x000000000066acae in pfree ()
(gdb) bt
#0 0x000000000066acae in pfree ()
#1 0x0000000000648c6e in ?? ()
#2 0x0000000000648f34 in ?? ()
#3 0x00000000006493d4 in RelationCacheInvalidateEntry ()
#4 0x0000000000644fcd in ?? ()
#5 0x0000000000644882 in ?? ()
#6 0x00000000006448be in CommandEndInvalidationMessages ()
#7 0x0000000000472993 in CommandCounterIncrement ()
#8 0x00000000005342ea in ?? ()
#9 0x0000000000534543 in SPI_execute_plan ()
#10 0x00002b41879e1148 in ?? () from /usr/lib64/pgsql/plpgsql.so
#11 0x00002b41879e2a26 in ?? () from /usr/lib64/pgsql/plpgsql.so
#12 0x00002b41879e4372 in ?? () from /usr/lib64/pgsql/plpgsql.so
#13 0x00002b41879e4ce5 in plpgsql_exec_function () from
/usr/lib64/pgsql/plpgsql.so
#14 0x00002b41879db7be in plpgsql_call_handler () from
/usr/lib64/pgsql/plpgsql.so
#15 0x000000000052023e in ExecMakeFunctionResult ()
#16 0x000000000051d1f3 in ExecProject ()
#17 0x000000000052df13 in ExecResult ()
#18 0x000000000051cc66 in ExecProcNode ()
#19 0x000000000051bedf in ExecutorRun ()
#20 0x00000000005b1481 in ?? ()
#21 0x00000000005b2689 in PortalRun ()
#22 0x00000000005ae3b0 in ?? ()
#23 0x00000000005af038 in PostgresMain ()
#24 0x00000000005856a7 in ?? ()
#25 0x000000000058632b in PostmasterMain ()
#26 0x000000000053eece in main ()

2010-04-24 | core.23631
-----------------------
Core was generated by `postgres: <user> <database_2> <client ip>('.
Program terminated with signal 11, Segmentation fault.
[New process 23631]
#0 0x000000000066acae in pfree ()
(gdb) bt
#0 0x000000000066acae in pfree ()
#1 0x0000000000648c6e in ?? ()
#2 0x0000000000648f34 in ?? ()
#3 0x00000000006493d4 in RelationCacheInvalidateEntry ()
#4 0x0000000000644fcd in ?? ()
#5 0x0000000000644882 in ?? ()
#6 0x00000000006448be in CommandEndInvalidationMessages ()
#7 0x0000000000472993 in CommandCounterIncrement ()
#8 0x00000000005342ea in ?? ()
#9 0x0000000000534543 in SPI_execute_plan ()
#10 0x00002b41879a0148 in ?? () from /usr/lib64/pgsql/plpgsql.so
#11 0x00002b41879a1a26 in ?? () from /usr/lib64/pgsql/plpgsql.so
#12 0x00002b41879a3372 in ?? () from /usr/lib64/pgsql/plpgsql.so
#13 0x00002b41879a3ce5 in plpgsql_exec_function () from
/usr/lib64/pgsql/plpgsql.so
#14 0x00002b418799a7be in plpgsql_call_handler () from
/usr/lib64/pgsql/plpgsql.so
#15 0x000000000052023e in ExecMakeFunctionResult ()
#16 0x000000000051d1f3 in ExecProject ()
#17 0x000000000052df13 in ExecResult ()
#18 0x000000000051cc66 in ExecProcNode ()
#19 0x000000000051bedf in ExecutorRun ()
#20 0x00000000005b1481 in ?? ()
#21 0x00000000005b2689 in PortalRun ()
#22 0x00000000005ae3b0 in ?? ()
#23 0x00000000005af038 in PostgresMain ()
#24 0x00000000005856a7 in ?? ()
#25 0x000000000058632b in PostmasterMain ()
#26 0x000000000053eece in main ()

2010-04-23 | core.9419
-----------------------
Core was generated by `postgres: <user> <database_1> <client ip>('.
Program terminated with signal 11, Segmentation fault.
[New process 9419]
#0 0x000000000066acae in pfree ()
(gdb) bt
#0 0x000000000066acae in pfree ()
#1 0x0000000000648c6e in ?? ()
#2 0x0000000000648f34 in ?? ()
#3 0x00000000006493d4 in RelationCacheInvalidateEntry ()
#4 0x0000000000644fcd in ?? ()
#5 0x0000000000644882 in ?? ()
#6 0x00000000006448be in CommandEndInvalidationMessages ()
#7 0x0000000000472993 in CommandCounterIncrement ()
#8 0x00000000005342ea in ?? ()
#9 0x0000000000534543 in SPI_execute_plan ()
#10 0x00002b3acaef4148 in ?? () from /usr/lib64/pgsql/plpgsql.so
#11 0x00002b3acaef5a26 in ?? () from /usr/lib64/pgsql/plpgsql.so
#12 0x00002b3acaef7372 in ?? () from /usr/lib64/pgsql/plpgsql.so
#13 0x00002b3acaef7ce5 in plpgsql_exec_function () from
/usr/lib64/pgsql/plpgsql.so
#14 0x00002b3acaeee7be in plpgsql_call_handler () from
/usr/lib64/pgsql/plpgsql.so
#15 0x000000000052023e in ExecMakeFunctionResult ()
#16 0x000000000051d1f3 in ExecProject ()
#17 0x000000000052df13 in ExecResult ()
#18 0x000000000051cc66 in ExecProcNode ()
#19 0x000000000051bedf in ExecutorRun ()
#20 0x00000000005b1481 in ?? ()
#21 0x00000000005b2689 in PortalRun ()
#22 0x00000000005ae3b0 in ?? ()
#23 0x00000000005af038 in PostgresMain ()
#24 0x00000000005856a7 in ?? ()
#25 0x000000000058632b in PostmasterMain ()
#26 0x000000000053eece in main ()

2010-04-22 | core.16801
-----------------------
Core was generated by `postgres: <user> <database_2> <client ip>('.
Program terminated with signal 11, Segmentation fault.
[New process 16801]
#0 0x000000000066acae in pfree ()
(gdb) bt
#0 0x000000000066acae in pfree ()
#1 0x0000000000648c6e in ?? ()
#2 0x0000000000648f34 in ?? ()
#3 0x00000000006493d4 in RelationCacheInvalidateEntry ()
#4 0x0000000000644fcd in ?? ()
#5 0x0000000000644882 in ?? ()
#6 0x00000000006448be in CommandEndInvalidationMessages ()
#7 0x0000000000472993 in CommandCounterIncrement ()
#8 0x00000000005342ea in ?? ()
#9 0x0000000000534543 in SPI_execute_plan ()
#10 0x00002b3acaeb3148 in ?? () from /usr/lib64/pgsql/plpgsql.so
#11 0x00002b3acaeb4a26 in ?? () from /usr/lib64/pgsql/plpgsql.so
#12 0x00002b3acaeb6372 in ?? () from /usr/lib64/pgsql/plpgsql.so
#13 0x00002b3acaeb6ce5 in plpgsql_exec_function () from
/usr/lib64/pgsql/plpgsql.so
#14 0x00002b3acaead7be in plpgsql_call_handler () from
/usr/lib64/pgsql/plpgsql.so
#15 0x000000000052023e in ExecMakeFunctionResult ()
#16 0x000000000051d1f3 in ExecProject ()
#17 0x000000000052df13 in ExecResult ()
#18 0x000000000051cc66 in ExecProcNode ()
#19 0x000000000051bedf in ExecutorRun ()
#20 0x00000000005b1481 in ?? ()
#21 0x00000000005b2689 in PortalRun ()
#22 0x00000000005ae3b0 in ?? ()
#23 0x00000000005af038 in PostgresMain ()
#24 0x00000000005856a7 in ?? ()
#25 0x000000000058632b in PostmasterMain ()
#26 0x000000000053eece in main ()

2010-04-15 | core.32242
-----------------------
Core was generated by `postgres: <user> <database_3> <client ip>('.
Program terminated with signal 11, Segmentation fault.
[New process 32242]
#0 0x000000000066acae in pfree ()
(gdb) bt
#0 0x000000000066acae in pfree ()
#1 0x0000000000648c6e in ?? ()
#2 0x0000000000648f34 in ?? ()
#3 0x00000000006493d4 in RelationCacheInvalidateEntry ()
#4 0x0000000000644fcd in ?? ()
#5 0x0000000000644882 in ?? ()
#6 0x00000000006448be in CommandEndInvalidationMessages ()
#7 0x0000000000472993 in CommandCounterIncrement ()
#8 0x0000000000525c25 in fmgr_sql ()
#9 0x000000000052023e in ExecMakeFunctionResult ()
#10 0x000000000051d1f3 in ExecProject ()
#11 0x000000000052df13 in ExecResult ()
#12 0x000000000051cc66 in ExecProcNode ()
#13 0x000000000051bedf in ExecutorRun ()
#14 0x00000000005b1481 in ?? ()
#15 0x00000000005b2689 in PortalRun ()
#16 0x00000000005ae3b0 in ?? ()
#17 0x00000000005af038 in PostgresMain ()
#18 0x00000000005856a7 in ?? ()
#19 0x000000000058632b in PostmasterMain ()
#20 0x000000000053eece in main ()

2010-04-14 | core.10776
-----------------------
Core was generated by `postgres: <user> <database_1> <client ip>('.
Program terminated with signal 11, Segmentation fault.
[New process 10776]
#0 0x000000000066acae in pfree ()
(gdb) bt
#0 0x000000000066acae in pfree ()
#1 0x0000000000648c6e in ?? ()
#2 0x0000000000648f34 in ?? ()
#3 0x00000000006493d4 in RelationCacheInvalidateEntry ()
#4 0x0000000000644fcd in ?? ()
#5 0x0000000000644882 in ?? ()
#6 0x00000000006448be in CommandEndInvalidationMessages ()
#7 0x0000000000472993 in CommandCounterIncrement ()
#8 0x00000000005342ea in ?? ()
#9 0x0000000000534543 in SPI_execute_plan ()
#10 0x00002b3acaeb3148 in ?? () from /usr/lib64/pgsql/plpgsql.so
#11 0x00002b3acaeb4a26 in ?? () from /usr/lib64/pgsql/plpgsql.so
#12 0x00002b3acaeb6372 in ?? () from /usr/lib64/pgsql/plpgsql.so
#13 0x00002b3acaeb6ce5 in plpgsql_exec_function () from
/usr/lib64/pgsql/plpgsql.so
#14 0x00002b3acaead7be in plpgsql_call_handler () from
/usr/lib64/pgsql/plpgsql.so
#15 0x000000000052023e in ExecMakeFunctionResult ()
#16 0x000000000051d1f3 in ExecProject ()
#17 0x000000000052df13 in ExecResult ()
#18 0x000000000051cc66 in ExecProcNode ()
#19 0x000000000051bedf in ExecutorRun ()
#20 0x00000000005b1481 in ?? ()
#21 0x00000000005b2689 in PortalRun ()
#22 0x00000000005ae3b0 in ?? ()
#23 0x00000000005af038 in PostgresMain ()
#24 0x00000000005856a7 in ?? ()
#25 0x000000000058632b in PostmasterMain ()
#26 0x000000000053eece in main ()

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2010-05-04 21:25:46 Re: How to exit/abort from a function that returns VOID?
Previous Message Andre Lopes 2010-05-04 20:55:35 How to exit/abort from a function that returns VOID?