From: | "Stephan Knauss" <sknauss(at)gmx(dot)de> |
---|---|
To: | "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #16024: segfault ip 0000560103865c60 error 4 in postgres |
Date: | 2019-09-27 17:21:20 |
Message-ID: | trinity-8e7f41eb-5919-4521-9b49-63734d4991f6-1569604880965@3c-app-gmx-bap15 |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
<html><head></head><body><div style="font-family: Verdana;font-size: 12.0px;"><div>
<div>Hello,</div>
<div> </div>
<div>I try to further condense the use-case to make it easier to figure out the root cause.</div>
<div> </div>
<div>The call involved here was recently modified. Could it be a regression?<br/>
https://github.com/postgres/postgres/commit/2938aa2a5b1cebb41f9e54c1ea289c286139c21e</div>
<div> </div>
<div>@Tom Lane: As you reviewe that change, could you please have another look on the details?</div>
<div><br/>
Please find further hints below in the hope it helps you to get to the root cause.</div>
<div> </div>
<div>While the issue seems to appear frequently when I run the task from a cron job, it crashes much more sporadic when manually calling.</div>
<div>Executing the full query using pgAdmin crashed postgres. Sub-Sequent runs seem to work fine.</div>
<div> </div>
<div>Today I tried to execute smaller parts of the query.</div>
<div> </div>
<div>The following query caused a segfault:</div>
<div>SELECT count(1) FROM planet_osm_ways WHERE ARRAY['motorway','trunk','primary','secondary','tertiary'] && tags;</div>
<div>postgres[32041]: segfault at 557f561cddbc ip 0000557f5450bc60 sp 00007ffe79c8a6c0 error 4 in postgres[557f541bc000+64d000]</div>
<div> </div>
<div>The instruction pointer is at least similar to the initial crash.</div>
<div> </div>
<div>This is the table queried:</div>
<div> </div>
<div>Table "public.planet_osm_ways" <br/>
Column | Type | Modifiers <br/>
--------+----------+----------- <br/>
id | bigint | not null <br/>
nodes | bigint[] | not null <br/>
tags | text[] | <br/>
Indexes: <br/>
"planet_osm_ways_pkey" PRIMARY KEY, btree (id) <br/>
"planet_osm_ways_nodes" gin (nodes) WITH (fastupdate=off) </div>
<div><br/>
I checked, the tags all have a cardinality greater of at least 2.</div>
<div> </div>
<div>The table planet_osm_ways is updated every few minutes by parallel tasks. Could it be some glitch in the table update?</div>
<div> </div>
<div>Executing the same query again works later on:</div>
<div> </div>
<div>gis=> SELECT count(1) FROM planet_osm_ways WHERE ARRAY['motorway','trunk','primary','secondary','tertiary'] && tags; <br/>
count <br/>
-------- <br/>
765231 <br/>
(1 row) </div>
<div> </div>
<div>also waiting a bit longer I was not able to reproduce the issue. But usually it comes back when waiting long enough.</div>
<div> </div>
<div>The initial call-stack showed the crash here:</div>
<div> </div>
<div>segfault ip 0000560103865c60 error 4 in postgres<br/>
#0 deconstruct_array (array=array(at)entry=0x564008a6bde8,<br/>
elmtype=elmtype(at)entry=25, elmlen=elmlen(at)entry=-1, elmbyval=elmbyval(at)entry=0<br/>
'\000', elmalign=elmalign(at)entry=105 'i', elemsp=elemsp(at)entry=0x7ffc5a33d570,<br/>
nullsp=0x7ffc5a33d578, nelemsp=0x7ffc5a33d56c)<br/>
at ./build/../src/backend/utils/adt/arrayfuncs.c:3530</div>
<div> </div>
<div>Which is later on only a read operation on the array address plus an offset:<br/>
p = att_addlength_pointer(p, elmlen, p);<br/>
expands to:<br/>
p = (cur_offset) + VARSIZE_ANY(attptr) </div>
<div>with VARSIZE_ANY doing a read like this:<br/>
((((varattrib_1b *) (PTR))->va_header) == 0x01)</div>
<div> </div>
<div>Crashing here would mean either the array pointer is off, or the pointer is too far towards the end of the array and va_header points after the end of the array.<br/>
Does the address sound reasonable? I am not that familiar with the virtual address space layout involved here. It is quite close to the instruction pointer address.</div>
<div>array=array(at)entry=0x564008a6bde8</div>
<div> </div>
<div>As I am doing an array overlap operation and the call stack passes through this function it hints that the crashing query could be the one above.</div>
<div>
<div>#2 0x0000564007ada7dd in arrayoverlap (fcinfo=0x564008a5d6e8) </div>
</div>
<div>If I have a bit more time and it would bring further details I could try getting a core for such crashes as well. Currently my assumption is that it is the same root cause.</div>
<div> </div>
<div>Please let me know what other details might help in getting an idea of what breaks here.</div>
<div> </div>
<div>Thanks,</div>
<div><br/>
Stephan<br/>
</div>
</div></div></body></html>
Attachment | Content-Type | Size |
---|---|---|
unknown_filename | text/html | 11.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2019-09-27 17:32:41 | Re: BUG #15993: "CREATE OR REPLACE FUNCTION" does not clear search_path |
Previous Message | Bruce Momjian | 2019-09-27 16:47:46 | Re: BUG #15991: Troubles about Management Tools for Postgre. |