From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Teodor Sigaev <teodor(at)sigaev(dot)ru> |
Cc: | Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: aggregate crash |
Date: | 2020-01-03 19:35:39 |
Message-ID: | 20200103193539.mzimh6ufogyppsrl@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2019-12-27 20:13:26 +0300, Teodor Sigaev wrote:
> Found crash on production instance, assert-enabled build crashes in pfree()
> call, with default config. v11, v12 and head are affected, but, seems, you
> need to be a bit lucky.
>
> The bug is comparing old and new aggregate pass-by-ref values only by
> pointer value itself, despite on null flag. Any function which returns null
> doesn't worry about actual returned Datum value, so that comparison isn't
> enough. Test case shows bug with ExecInterpExpr() but there several similar
> places (thanks Nikita Glukhov for help).
> Attached patch adds check of null flag.
Hm. I don't understand the problem here. Why do we need to reparent in
that case? What freed the relevant value?
Nor do I really understand why v10 wouldn't be affected if this actually
is a problem. The relevant code is also only guarded by
DatumGetPointer(newVal) != DatumGetPointer(pergroupstate->transValue))
>
> Backtrace from v12 (note, newValue and oldValue are differ on current call,
> but oldValue points into pfreed memory) :
> #0 0x0000000000c8405a in GetMemoryChunkContext (pointer=0x80a808250) at
> ../../../../src/include/utils/memutils.h:130
> 130 AssertArg(MemoryContextIsValid(context));
> (gdb) bt
> #0 0x0000000000c8405a in GetMemoryChunkContext (pointer=0x80a808250) at
> ../../../../src/include/utils/memutils.h:130
> #1 0x0000000000c85ae5 in pfree (pointer=0x80a808250) at mcxt.c:1058
> #2 0x000000000080475e in ExecAggTransReparent (aggstate=0x80a806370,
> pertrans=0x80a87e830, newValue=34535940744, newValueIsNull=false,
> oldValue=34535932496, oldValueIsNull=false)
> at execExprInterp.c:4209
> #3 0x00000000007ff51f in ExecInterpExpr (state=0x80a87f4d8,
> econtext=0x80a8065a8, isnull=0x7fffffffd7b7) at execExprInterp.c:1747
> #4 0x000000000082c12b in ExecEvalExprSwitchContext (state=0x80a87f4d8,
> econtext=0x80a8065a8, isNull=0x7fffffffd7b7) at
> ../../../src/include/executor/executor.h:308
> #5 0x000000000082bc0f in advance_aggregates (aggstate=0x80a806370) at nodeAgg.c:679
> #6 0x000000000082b8a6 in agg_retrieve_direct (aggstate=0x80a806370) at
> nodeAgg.c:1847
> #7 0x0000000000828782 in ExecAgg (pstate=0x80a806370) at nodeAgg.c:1572
> #8 0x000000000080e712 in ExecProcNode (node=0x80a806370) at
> ../../../src/include/executor/executor.h:240
> How to reproduce:
> http://sigaev.ru/misc/xdump.sql.bz2
> bzcat xdump.sql.bz2 | psql postgres && psql postgres < x.sql
It should be possible to create a smaller reproducer... It'd be good if
a bug fix for this were committed with a regression test.
> diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
> index 034970648f3..3b5333716d4 100644
> --- a/src/backend/executor/execExprInterp.c
> +++ b/src/backend/executor/execExprInterp.c
> @@ -1743,7 +1743,8 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
> * expanded object that is already a child of the aggcontext,
> * assume we can adopt that value without copying it.
> */
> - if (DatumGetPointer(newVal) != DatumGetPointer(pergroup->transValue))
> + if (DatumGetPointer(newVal) != DatumGetPointer(pergroup->transValue) ||
> + fcinfo->isnull != pergroup->transValueIsNull)
> newVal = ExecAggTransReparent(aggstate, pertrans,
> newVal, fcinfo->isnull,
> pergroup->transValue,
I'd really like to avoid adding additional branches to these paths.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Stephen Frost | 2020-01-03 19:35:59 | Re: backup manifests |
Previous Message | Robert Haas | 2020-01-03 19:32:01 | Re: Greatest Common Divisor |