Re: Troubleshooting a segfault and instance crash

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Blair Boadway <bboadway(at)abebooks(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Troubleshooting a segfault and instance crash
Date: 2018-03-08 18:34:08
Message-ID: CAFj8pRDP2iXomMz_p3SNYaSGLJA_uvOj2xvy2hMw-PLU1LxqBA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

2018-03-08 19:16 GMT+01:00 Blair Boadway <bboadway(at)abebooks(dot)com>:

> Hi Pavel,
>
>
>
> I don’t have a core yet, the only way I have now is to intentionally crash
> the prod system a couple of times. Haven’t resorted to that yet.
>

hard to help without backtrace - and then you need core dump

>
>
> Interesting you mentioned pgaudit—it is installed on this system because
> that is a our standard installation but on this particular system we
> haven’t yet needed audits so the audit role is ‘empty’. (And on a
> different system with same installation and heavy of audit we’ve seen no
> segfaults)
>
>
>

other extensions are simply or without relation to DDL or well known. So
pgaudit is best candidate - but the error can be anywhere

Regards

Pavel

> On this system
>
>
>
> pgaudit.role = 'auditor'
>
> pgaudit.log_parameter = off
>
> pgaudit.log_catalog = off
>
> pgaudit.log_statement_once = on
>
> pgaudit.log_level = log
>
>
>
>
>
> select * from information_schema.role_table_grants where grantee =
> 'auditor';
>
> (0 rows)
>
>
>
>
>
> thanks, Blair
>
>
>
> *From: *Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
> *Date: *Thursday, March 8, 2018 at 9:49 AM
> *To: *Blair Boadway <bboadway(at)abebooks(dot)com>
> *Cc: *"pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
> *Subject: *Re: Troubleshooting a segfault and instance crash
>
>
>
> Hi
>
>
>
> 2018-03-08 18:40 GMT+01:00 Blair Boadway <bboadway(at)abebooks(dot)com>:
>
> Hello,
>
>
>
> We’re seeing an occasional segfault on a particular database
>
>
>
> Mar 7 14:46:35 pgprod2 kernel:postgres[29351]: segfault at 0 ip
> 000000302f32868a sp 00007ffcf1547498 error 4 in libc-2.12.so[302f200000+
> 18a000]
>
> Mar 7 14:46:35 pgprod2 POSTGRES[21262]: [5] user=,db=,app=client= LOG:
> server process (PID 29351) was terminated by signal 11: Segmentation fault
>
>
>
> It crashes the database, though it starts again on its own without any
> apparent issues. This has happened 3 times in 2 months and each time the
> segfault error and memory address is the same. We’ve only seen it on one
> database, though we’ve seen it on both hosts of primary/standby setup—we
> switched over primary to other host and got a segfault there, which seems
> to eliminate a hardware issue. Oddly the database has no issues for normal
> DML workloads (it is a moderately busy prod oltp system) but the segfault
> has happened very shortly after DML changes are made. Most recently it
> happened while running a series of grants for new db users we were
> deploying (ie. running a sql script from psql on the primary host)
>
>
>
> grant usage on schema app to app_user1;
>
> grant usage on schema app to app_user2;
>
> ...
>
>
>
> Our set up is
>
> RHEL 6.9 - 2.6.32-696.16.1.el6.x86_64
>
> PostgreSQL 9.6.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7
> 20120313 (Red Hat 4.4.7-18), 64-bit
>
> Extensions - pg_cron,repmgr_funcs,pgaudit,pg_stat_statements,pg_hint_
> plan,pglogical
>
>
>
> So far can’t reproduce on a test system, have just added some OS config to
> collect core from the OS but haven’t collected a core yet. There isn’t any
> particular config change or extension that we can link to the problem, this
> is a system that has run for months without problems since last config
> changes. Appreciate any ideas.
>
>
>
> can you get core dump? It can be pgaudit bug maybe? It is complex
> extension.
>
> Regards
>
>
>
> Pavel
>
>
>
> Regards,
>
> Blair
>
>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2018-03-08 18:57:06 Re: circular wait not triggering deadlock ?
Previous Message Scott Frazer 2018-03-08 18:31:02 Re: Help troubleshooting SubtransControlLock problems