From: | Justin Pryzby <pryzby(at)telsasoft(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Cc: | Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> |
Subject: | SIGSEGV in BRIN autosummarize |
Date: | 2017-10-14 03:57:32 |
Message-ID: | 20171014035732.GB31726@telsasoft.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I upgraded one of our customers to PG10 Tuesday night, and Wednesday replaced
an BTREE index with BRIN index (WITH autosummarize).
Today I see:
< 2017-10-13 17:22:47.839 -04 >LOG: server process (PID 32127) was terminated by signal 11: Segmentation fault
< 2017-10-13 17:22:47.839 -04 >DETAIL: Failed process was running: autovacuum: BRIN summarize public.gtt 747263
postmaster[32127] general protection ip:4bd467 sp:7ffd9b349990 error:0 in postgres[400000+692000]
[pryzbyj(at)database ~]$ rpm -qa postgresql10
postgresql10-10.0-1PGDG.rhel6.x86_64
Oct 13 17:22:45 database kernel: postmaster[32127] general protection ip:4bd467 sp:7ffd9b349990 error:0 in postgres[400000+692000]
Oct 13 17:22:47 database abrtd: Directory 'ccpp-2017-10-13-17:22:47-32127' creation detected
Oct 13 17:22:47 database abrt[32387]: Saved core dump of pid 32127 (/usr/pgsql-10/bin/postgres) to /var/spool/abrt/ccpp-2017-10-13-17:22:47-32127 (15040512 bytes)
..unfortunately:
Oct 13 17:22:47 database abrtd: Package 'postgresql10-server' isn't signed with proper key
Oct 13 17:22:47 database abrtd: 'post-create' on '/var/spool/abrt/ccpp-2017-10-13-17:22:47-32127' exited with 1
Oct 13 17:22:47 database abrtd: DELETING PROBLEM DIRECTORY '/var/spool/abrt/ccpp-2017-10-13-17:22:47-32127'
postgres=# SELECT * FROM bak_postgres_log_2017_10_13_1700 WHERE pid=32127 ORDER BY log_time DESC LIMIT 9;
-[ RECORD 1 ]----------+---------------------------------------------------------------------------------------------------------
log_time | 2017-10-13 17:22:45.56-04
pid | 32127
session_id | 59e12e67.7d7f
session_line | 2
command_tag |
session_start_time | 2017-10-13 17:21:43-04
error_severity | ERROR
sql_state_code | 57014
message | canceling autovacuum task
context | processing work entry for relation "gtt.public.cdrs_eric_egsnpdprecord_2017_10_13_recordopeningtime_idx"
-[ RECORD 2 ]----------+---------------------------------------------------------------------------------------------------------
log_time | 2017-10-13 17:22:44.557-04
pid | 32127
session_id | 59e12e67.7d7f
session_line | 1
session_start_time | 2017-10-13 17:21:43-04
error_severity | ERROR
sql_state_code | 57014
message | canceling autovacuum task
context | automatic analyze of table "gtt.public.cdrs_huawei_sgsnpdprecord_2017_10_13"
Time: 375.552 ms
It looks like this table was being inserted into simultaneously by a python
program using multiprocessing. It looks like each subprocess was INSERTing
into several tables, each of which has one BRIN index on timestamp column.
gtt=# \dt+ cdrs_eric_egsnpdprecord_2017_10_13
public | cdrs_eric_egsnpdprecord_2017_10_13 | table | gtt | 5841 MB |
gtt=# \di+ cdrs_eric_egsnpdprecord_2017_10_13_recordopeningtime_idx
public | cdrs_eric_egsnpdprecord_2017_10_13_recordopeningtime_idx | index | gtt | cdrs_eric_egsnpdprecord_2017_10_13 | 136 kB |
I don't have any reason to believe there's memory issue on the server, So I
suppose this is just a "heads up" to early adopters until/in case it happens
again and I can at least provide a stack trace.
Justin
From | Date | Subject | |
---|---|---|---|
Next Message | Fabien COELHO | 2017-10-14 05:25:06 | Re: show precise repos version for dev builds? |
Previous Message | Noah Misch | 2017-10-14 02:09:41 | Re: heap/SLRU verification, relfrozenxid cut-off, and freeze-the-dead bug (Was: amcheck (B-Tree integrity checking tool)) |