v12.0: segfault in reindex CONCURRENTLY

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>
Subject: v12.0: segfault in reindex CONCURRENTLY
Date: 2019-10-12 00:44:46
Message-ID: 20191012004446.GT10470@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

One of our servers crashed last night like this:

< 2019-10-10 22:31:02.186 EDT postgres >STATEMENT: REINDEX INDEX CONCURRENTLY child.eric_umts_rnc_utrancell_hsdsch_eul_201910_site_idx
< 2019-10-10 22:31:02.399 EDT >LOG: server process (PID 29857) was terminated by signal 11: Segmentation fault
< 2019-10-10 22:31:02.399 EDT >DETAIL: Failed process was running: REINDEX INDEX CONCURRENTLY child.eric_umts_rnc_utrancell_hsdsch_eul_201910_site_idx
< 2019-10-10 22:31:02.399 EDT >LOG: terminating any other active server processes

ts=# \d+ child.eric_umts_rnc_utrancell_hsdsch_eul_201910_site_idx
Index "child.eric_umts_rnc_utrancell_hsdsch_eul_201910_site_idx"
Column | Type | Key? | Definition | Storage | Stats target
---------+---------+------+------------+---------+--------------
site_id | integer | yes | site_id | plain |
btree, for table "child.eric_umts_rnc_utrancell_hsdsch_eul_201910"

That's an index on a table partition, but not itself a child of a relkind=I
index.

Unfortunately, there was no core file, and I'm still trying to reproduce it.

I can't see that the table was INSERTed into during the reindex...
But looks like it was SELECTed from, and the report finished within 1sec of the
crash:

(2019-10-10 22:30:50,485 - p1604 t140325365622592 - INFO): PID 1604 finished running report; est=None rows=552; cols=83; [...] duration:12

postgres=# SELECT log_time, pid, session_id, left(message,99), detail FROM postgres_log_2019_10_10_2200 WHERE pid=29857 OR (log_time BETWEEN '2019-10-10 22:31:02.18' AND '2019-10-10 22:31:02.4' AND NOT message~'crash of another') ORDER BY log_time LIMIT 9;
2019-10-10 22:30:24.441-04 | 29857 | 5d9fe93f.74a1 | temporary file: path "base/pgsql_tmp/pgsql_tmp29857.0.sharedfileset/0.0", size 3096576 |
2019-10-10 22:30:24.442-04 | 29857 | 5d9fe93f.74a1 | temporary file: path "base/pgsql_tmp/pgsql_tmp29857.0.sharedfileset/1.0", size 2809856 |
2019-10-10 22:30:24.907-04 | 29857 | 5d9fe93f.74a1 | process 29857 still waiting for ShareLock on virtual transaction 30/103010 after 333.078 ms | Process holding the lock: 29671. Wait queue: 29857.
2019-10-10 22:31:02.186-04 | 29857 | 5d9fe93f.74a1 | process 29857 acquired ShareLock on virtual transaction 30/103010 after 37611.995 ms |
2019-10-10 22:31:02.186-04 | 29671 | 5d9fe92a.73e7 | duration: 50044.778 ms statement: SELECT fn, sz FROM +|
| | | (SELECT file_name fn, file_size_bytes sz, +|
| | | |
2019-10-10 22:31:02.399-04 | 1161 | 5d9cad9e.489 | terminating any other active server processes |
2019-10-10 22:31:02.399-04 | 1161 | 5d9cad9e.489 | server process (PID 29857) was terminated by signal 11: Segmentation fault | Failed process was running: REINDEX INDEX CONCURRENTLY child.eric_umts_rnc_utrancell_hsdsch_eul_201910_site_idx

Justin

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-10-12 03:05:47 Re: let's make the list of reportable GUCs configurable (was Re: Add %r substitution for psql prompts to show recovery status)
Previous Message Andres Freund 2019-10-11 21:49:49 Re: let's make the list of reportable GUCs configurable (was Re: Add %r substitution for psql prompts to show recovery status)