Re: gs_group_1 crashing on 13beta2/s390x

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Christoph Berg <myon(at)debian(dot)org>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gs_group_1 crashing on 13beta2/s390x
Date: 2020-07-15 21:45:35
Message-ID: 3176347.1594849535@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Christoph Berg <myon(at)debian(dot)org> writes:
>> On the Debian s390x buildd, the 13beta2 build is crashing:

> I wired gdb into the build process and got this backtrace:

> #0 datumCopy (typByVal=false, typLen=-1, value=0) at ./build/../src/backend/utils/adt/datum.c:142
> vl = 0x0
> res = <optimized out>
> res = <optimized out>
> vl = <optimized out>
> eoh = <optimized out>
> resultsize = <optimized out>
> resultptr = <optimized out>
> realSize = <optimized out>
> resultptr = <optimized out>
> realSize = <optimized out>
> resultptr = <optimized out>
> #1 datumCopy (value=0, typByVal=false, typLen=-1) at ./build/../src/backend/utils/adt/datum.c:131
> res = <optimized out>
> vl = <optimized out>
> eoh = <optimized out>
> resultsize = <optimized out>
> resultptr = <optimized out>
> realSize = <optimized out>
> resultptr = <optimized out>
> #2 0x000002aa04423af8 in finalize_aggregate (aggstate=aggstate(at)entry=0x2aa05775920, peragg=peragg(at)entry=0x2aa056e02f0, resultVal=0x2aa056e0208, resultIsNull=0x2aa056e022a, pergroupstate=<optimized out>, pergroupstate=<optimized out>) at ./build/../src/backend/executor/nodeAgg.c:1128

Hmm. If gdb isn't lying to us, that has to be coming from here:

/*
* If result is pass-by-ref, make sure it is in the right context.
*/
if (!peragg->resulttypeByVal && !*resultIsNull &&
!MemoryContextContains(CurrentMemoryContext,
DatumGetPointer(*resultVal)))
*resultVal = datumCopy(*resultVal,
peragg->resulttypeByVal,
peragg->resulttypeLen);

The line numbers in HEAD are a bit different, but that's the only
call of datumCopy() in finalize_aggregate().

It's hardly surprising that datumCopy would segfault when given
a null "value" and told it is pass-by-reference. However, to get to
the datumCopy call, we must have passed the MemoryContextContains
check on that very same pointer value, and that would surely have
segfaulted as well, one would think.

Given the apparently-can't-happen situation at the call site,
and the fact that we're not seeing similar failures reported
elsewhere (and note that every line shown above is at least
five years old), I'm kind of forced to the conclusion that this
is a compiler bug. Does adjusting the -O level make it go away?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2020-07-15 21:46:03 Re: Generic Index Skip Scan
Previous Message Tom Lane 2020-07-15 20:40:59 Re: Warn when parallel restoring a custom dump without data offsets