Re: Extract numeric filed in JSONB more effectively

From: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
To: Chapman Flack <chap(at)anastigmatix(dot)net>
Cc: jian he <jian(dot)universality(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Extract numeric filed in JSONB more effectively
Date: 2023-08-30 04:47:35
Message-ID: CAKU4AWrxBXomhGcubsLqBj4FRbOf1AEXqpwBsm-mGJEJ0G_ypA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

(Sorry for leaving this discussion for such a long time, how times fly!)

On Sun, Aug 27, 2023 at 6:28 AM Chapman Flack <chap(at)anastigmatix(dot)net> wrote:

> On 2023-08-22 08:16, Chapman Flack wrote:
> > On 2023-08-22 01:54, Andy Fan wrote:
> >> After we label it, we will get error like this:
> >>
> >> select (a->'a')::int4 from m;
> >> ERROR: cannot display a value of type internal
> >
> > Without looking in depth right now, I would double-check
> > what relabel node is being applied at the result. The idea,
> > of course, was to relabel the result as the expected result
>
> as I suspected, looking at v10-0003, there's this:
>
> + fexpr = (FuncExpr *)makeRelabelType((Expr *) fexpr, INTERNALOID,
> + 0, InvalidOid,
> COERCE_IMPLICIT_CAST);
>
> compared to the example I had sent earlier:
>
> On 2023-08-18 17:02, Chapman Flack wrote:
> > expr = (Expr *)makeRelabelType((Expr *)fexpr,
> > targetOid, -1, InvalidOid, COERCE_IMPLICIT_CAST);
>
> The key difference: this is the label going on the result type
> of the function we are swapping in.

I'm feeling we have some understanding gap in this area, let's
see what it is. Suppose the original query is:

numeric(jsonb_object_field(v_jsonb, text)) -> numeric.

without the patch 003, the rewritten query is:
jsonb_object_field_type(NUMERICOID, v_jsonb, text) -> NUMERIC.

However the declared type of jsonb_object_field_type is:

jsonb_object_field_type(internal, jsonb, text) -> internal.

So the situation is: a). We input a CONST(type=OIDOID, ..) for an
internal argument. b). We return a NUMERIC type which matches
the original query c). result type NUMERIC doesn't match the declared
type 'internal' d). it doesn't match the run-time type of internal
argument which is OID.

case a) is fixed by RelableType. case b) shouldn't be treat as an
issue. I thought you wanted to address the case c), and patch
003 tries to fix it, then the ERROR above. At last I realized case
c) isn't the one you want to fix. case d) shouldn't be requirement
at the first place IIUC.

So your new method is:
1. jsonb_{op}_start() -> internal (internal actually JsonbValue).
2. jsonb_finish_{type}(internal, ..) -> type. (internal is JsonbValue ).

This avoids the case a) at the very beginning. I'd like to provides
patches for both solutions for comparison. Any other benefits of
this method I am missing?

> Two more minor differences: (1) the node you get from
> makeRelabelType is an Expr, but not really a FuncExpr. Casting
> it to FuncExpr is a bit bogus. Also, the third argument to
> makeRelabelType is a typmod, and I believe the "not-modified"
> typmod is -1, not 0.
>

My fault, you are right.

>
> On 2023-08-21 06:50, Andy Fan wrote:
> > I'm not very excited with this manner, reasons are: a). It will have
> > to emit more steps in ExprState->steps which will be harmful for
> > execution. The overhead is something I'm not willing to afford.
>
> I would be open to a performance comparison, but offhand I am not
> sure whether the overhead of another step or two in an ExprState
> is appreciably more than some of the overhead in the present patch,
> such as the every-time-through fcinfo initialization buried in
> DirectFunctionCall1 where you don't necessarily see it. I bet

the fcinfo in an ExprState step gets set up once, and just has
> new argument values slammed into it each time through.
>

fcinfo initialization in DirectFunctionCall1 is an interesting point!
so I am persuaded the extra steps in ExprState may not be
worse than the current way due to the "every-time-through
fcinfo initialization" (in which case the memory is allocated
once in heap rather than every time in stack). I can do a
comparison at last to see if we can find some other interesting
findings.

> I would not underestimate the benefit of reducing the code
> duplication and keeping the patch as clear as possible.
> The key contributions of the patch are getting a numeric or
> boolean efficiently out of the JSON operation. Getting from
> numeric to int or float are things the system already does
> well.

True, reusing the casting system should be better than hard-code
the casting function manually. I'd apply this on both methods.

> A patch that focuses on what it contributes, and avoids
> redoing things the system already can do--unless the duplication
> can be shown to have a strong performance benefit--is easier to
> review and probably to get integrated.
>

Agreed.

At last, thanks for the great insights and patience!

--
Best Regards
Andy Fan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2023-08-30 05:27:55 Re: [PoC] pg_upgrade: allow to upgrade publisher node
Previous Message Amit Kapila 2023-08-30 04:16:10 Re: persist logical slots to disk during shutdown checkpoint