Quick Links

SIGSEGV in BRIN autosummarize

From:	Justin Pryzby <pryzby(at)telsasoft(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject:	SIGSEGV in BRIN autosummarize
Date:	2017-10-14 03:57:32
Message-ID:	20171014035732.GB31726@telsasoft.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

I upgraded one of our customers to PG10 Tuesday night, and Wednesday replaced
an BTREE index with BRIN index (WITH autosummarize).

Today I see:
< 2017-10-13 17:22:47.839 -04 >LOG: server process (PID 32127) was terminated by signal 11: Segmentation fault
< 2017-10-13 17:22:47.839 -04 >DETAIL: Failed process was running: autovacuum: BRIN summarize public.gtt 747263

postmaster[32127] general protection ip:4bd467 sp:7ffd9b349990 error:0 in postgres[400000+692000]

[pryzbyj(at)database ~]$ rpm -qa postgresql10
postgresql10-10.0-1PGDG.rhel6.x86_64

Oct 13 17:22:45 database kernel: postmaster[32127] general protection ip:4bd467 sp:7ffd9b349990 error:0 in postgres[400000+692000]
Oct 13 17:22:47 database abrtd: Directory 'ccpp-2017-10-13-17:22:47-32127' creation detected
Oct 13 17:22:47 database abrt[32387]: Saved core dump of pid 32127 (/usr/pgsql-10/bin/postgres) to /var/spool/abrt/ccpp-2017-10-13-17:22:47-32127 (15040512 bytes)

..unfortunately:
Oct 13 17:22:47 database abrtd: Package 'postgresql10-server' isn't signed with proper key
Oct 13 17:22:47 database abrtd: 'post-create' on '/var/spool/abrt/ccpp-2017-10-13-17:22:47-32127' exited with 1
Oct 13 17:22:47 database abrtd: DELETING PROBLEM DIRECTORY '/var/spool/abrt/ccpp-2017-10-13-17:22:47-32127'

postgres=# SELECT * FROM bak_postgres_log_2017_10_13_1700 WHERE pid=32127 ORDER BY log_time DESC LIMIT 9;
-[ RECORD 1 ]----------+---------------------------------------------------------------------------------------------------------
log_time | 2017-10-13 17:22:45.56-04
pid | 32127
session_id | 59e12e67.7d7f
session_line | 2
command_tag |
session_start_time | 2017-10-13 17:21:43-04
error_severity | ERROR
sql_state_code | 57014
message | canceling autovacuum task
context | processing work entry for relation "gtt.public.cdrs_eric_egsnpdprecord_2017_10_13_recordopeningtime_idx"
-[ RECORD 2 ]----------+---------------------------------------------------------------------------------------------------------
log_time | 2017-10-13 17:22:44.557-04
pid | 32127
session_id | 59e12e67.7d7f
session_line | 1
session_start_time | 2017-10-13 17:21:43-04
error_severity | ERROR
sql_state_code | 57014
message | canceling autovacuum task
context | automatic analyze of table "gtt.public.cdrs_huawei_sgsnpdprecord_2017_10_13"

Time: 375.552 ms

It looks like this table was being inserted into simultaneously by a python
program using multiprocessing. It looks like each subprocess was INSERTing
into several tables, each of which has one BRIN index on timestamp column.

I don't have any reason to believe there's memory issue on the server, So I
suppose this is just a "heads up" to early adopters until/in case it happens
again and I can at least provide a stack trace.

Justin

Responses

Re: SIGSEGV in BRIN autosummarize at 2017-10-14 22:42:20 from Justin Pryzby

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Fabien COELHO	2017-10-14 05:25:06	Re: show precise repos version for dev builds?
Previous Message	Noah Misch	2017-10-14 02:09:41	Re: heap/SLRU verification, relfrozenxid cut-off, and freeze-the-dead bug (Was: amcheck (B-Tree integrity checking tool))