From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: aggregate crash |
Date: | 2020-01-16 03:16:30 |
Message-ID: | 20200116031630.7ip7lid23tpyhna2@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2020-01-15 12:47:47 -0800, Andres Freund wrote:
> FWIW, I'm working on narrowing it down to something small. I can
> reliably trigger the bug, and I understand the mechanics, I
> think. Interestingly enough the reproducer currently only triggers on
> v12, not on v11 and before.
That's just happenstance due to allocation changes in plpgsql,
though. The attached small reproducer, for me, reliably triggers crashes
on 10 - master.
It's hard to hit intentionally, because plpgsql does a datumCopy() to
its non-null return value, which means that to hit the bug, one needs
different numbers of allocations between setting up the transition value
with transvalueisnull = true, transvalue = 0xsomepointer (because
plpgsql doesn't copy NULLs), and the transition output with
transvalueisnull = false, transvalue = 0xsomepointer. Which is necessary
to trigger the bug, as it's then not reparented into a long lived enough
context. To be then freed/accessed for the next group input value.
I think this is too finnicky to actually keep as a regression test.
The bug, in a way, exists all the way back, but it's a bit harder to
create NULL values where the datum component isn't 0.
To fix I suggest we, in all branches, do the equivalent of adding
something like:
diff --git i/src/backend/executor/execExprInterp.c w/src/backend/executor/execExprInterp.c
index 790380051be..3260a63ac6b 100644
--- i/src/backend/executor/execExprInterp.c
+++ w/src/backend/executor/execExprInterp.c
@@ -4199,6 +4199,12 @@ ExecAggTransReparent(AggState *aggstate, AggStatePerTrans pertrans,
pertrans->transtypeByVal,
pertrans->transtypeLen);
}
+ else
+ {
+ /* ensure datum component is 0 for NULL transition values */
+ newValue = (Datum) 0;
+ }
+
if (!oldValueIsNull)
{
if (DatumIsReadWriteExpandedObject(oldValue,
and a comment explaining why it's (now) safe to rely on datum
comparisons for
if (DatumGetPointer(newVal) != DatumGetPointer(pergroup->transValue))
I don't think it makes sense to add something like it to the byval case
- there's plenty other ways a function returning != 0 with
fcinfo->isnull == true can cause such values to exist. And that's
longstanding.
A separate question is whether it's worth adding code to
e.g. EEO_CASE(EEOP_FUNCEXPR_STRICT) also resetting *op->resvalue to
(Datum) 0. I don't personally don't think ensuring the datum is always
0 when isnull true is all that helpful, if we can't guarantee it
everywhere. So I'm a bit loathe to add cycles to places that don't need
it, and are hot.
Regards,
Andres
Attachment | Content-Type | Size |
---|---|---|
xrepro.sql | application/x-sql | 1.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2020-01-16 03:22:33 | Re: Setting min/max TLS protocol in clientside libpq |
Previous Message | Thomas Munro | 2020-01-16 02:59:55 | Re: pgindent && weirdness |