Re: BUG #18657: Using JSON_OBJECTAGG with volatile function leads to segfault

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Tender Wang <tndrwang(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, exclusion(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18657: Using JSON_OBJECTAGG with volatile function leads to segfault
Date: 2024-10-17 03:52:09
Message-ID: CA+HiwqGAU0tFHBp=uxcvxHbLKqAihbWyEfpYy-7MEkdCVGyoXA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Thu, Oct 17, 2024 at 9:56 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Amit Langote <amitlangote09(at)gmail(dot)com> writes:
> > On Wed, Oct 16, 2024 at 11:26 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> Would it be better for parse analysis to explicitly NULL out this
> >> field once it's done looking at it? Carrying unmaintained pieces
> >> of an expression tree around seems both inefficient and prone to
> >> future failures of this same ilk. The further the raw_expr gets
> >> out of step with current reality, the worse the hazards.
>
> > It seems we can't do that in this case, because get_rule_expr() wants
> > to read it.
>
> I think you just increased my level of concern. Aren't you saying
> that get_rule_expr is likely to deliver utter garbage, because it
> will be working from an expression that has not tracked
> transformations made to other parts of the tree?
>
> Also, I see that ruleutils is applying get_rule_expr to raw_expr,
> which AFAICT means it's *not* raw-parser output, else that wouldn't
> work at all. Which means the field is badly named and incorrectly
> documented. But in particular it means you cannot just make the
> tree-walking routines ignore it --- you *must* update it during
> transformations such as subquery pullup, or you will have garbage
> that will cause EXPLAIN to crash or at least produce wrong results.

One thing we could do is have get_rule_expr() print formatted_expr.
While this would make the output a bit verbose as shown in one of the
diffs I get, it would at least avoid the issues you mentioned with
printing raw_expr.

EXPLAIN (VERBOSE, COSTS OFF) SELECT JSON('123'::bytea FORMAT JSON);
- QUERY PLAN
------------------------------------------------
+ QUERY PLAN
+-----------------------------------------------------------------------------------
Result
- Output: JSON('\x313233'::bytea FORMAT JSON)
+ Output: JSON((convert_from('\x313233'::bytea, 'UTF8'::name))::json
FORMAT JSON)

> In other words, the proposed patch is dangerously wrong. I suspect at
> this point that the true bug might be the opposite: somebody feeling
> that they could dispense with updating raw_expr when they shouldn't.

The issue in this particular case is that preprocess_aggref() adds the
AggTransInfo of the Aggref mentioned in raw_expr because
expression_tree_walker() processes it. However, since raw_expr is
ignored by the executor, the Aggref contained in it is not added to
AggState.aggs. This causes the aggno and aggtransno values of the
Aggrefs that are added to become out of sync, leading to a crash.

I don't see a way to control whether raw_expr is processed by
preprocess_aggref other than ignoring it in expression_tree_walker().
But maybe I misunderstood you.

--
Thanks, Amit Langote

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2024-10-17 04:12:09 Re: BUG #18657: Using JSON_OBJECTAGG with volatile function leads to segfault
Previous Message Andrei Lepikhov 2024-10-17 02:51:12 Re: Performance Issue on Query 18 of TPC-H Benchmark