From: | Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com> |
---|---|
To: | Chapman Flack <chap(at)anastigmatix(dot)net> |
Cc: | jian he <jian(dot)universality(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Extract numeric filed in JSONB more effectively |
Date: | 2023-08-30 04:47:35 |
Message-ID: | CAKU4AWrxBXomhGcubsLqBj4FRbOf1AEXqpwBsm-mGJEJ0G_ypA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
(Sorry for leaving this discussion for such a long time, how times fly!)
On Sun, Aug 27, 2023 at 6:28 AM Chapman Flack <chap(at)anastigmatix(dot)net> wrote:
> On 2023-08-22 08:16, Chapman Flack wrote:
> > On 2023-08-22 01:54, Andy Fan wrote:
> >> After we label it, we will get error like this:
> >>
> >> select (a->'a')::int4 from m;
> >> ERROR: cannot display a value of type internal
> >
> > Without looking in depth right now, I would double-check
> > what relabel node is being applied at the result. The idea,
> > of course, was to relabel the result as the expected result
>
> as I suspected, looking at v10-0003, there's this:
>
> + fexpr = (FuncExpr *)makeRelabelType((Expr *) fexpr, INTERNALOID,
> + 0, InvalidOid,
> COERCE_IMPLICIT_CAST);
>
> compared to the example I had sent earlier:
>
> On 2023-08-18 17:02, Chapman Flack wrote:
> > expr = (Expr *)makeRelabelType((Expr *)fexpr,
> > targetOid, -1, InvalidOid, COERCE_IMPLICIT_CAST);
>
> The key difference: this is the label going on the result type
> of the function we are swapping in.
I'm feeling we have some understanding gap in this area, let's
see what it is. Suppose the original query is:
numeric(jsonb_object_field(v_jsonb, text)) -> numeric.
without the patch 003, the rewritten query is:
jsonb_object_field_type(NUMERICOID, v_jsonb, text) -> NUMERIC.
However the declared type of jsonb_object_field_type is:
jsonb_object_field_type(internal, jsonb, text) -> internal.
So the situation is: a). We input a CONST(type=OIDOID, ..) for an
internal argument. b). We return a NUMERIC type which matches
the original query c). result type NUMERIC doesn't match the declared
type 'internal' d). it doesn't match the run-time type of internal
argument which is OID.
case a) is fixed by RelableType. case b) shouldn't be treat as an
issue. I thought you wanted to address the case c), and patch
003 tries to fix it, then the ERROR above. At last I realized case
c) isn't the one you want to fix. case d) shouldn't be requirement
at the first place IIUC.
So your new method is:
1. jsonb_{op}_start() -> internal (internal actually JsonbValue).
2. jsonb_finish_{type}(internal, ..) -> type. (internal is JsonbValue ).
This avoids the case a) at the very beginning. I'd like to provides
patches for both solutions for comparison. Any other benefits of
this method I am missing?
> Two more minor differences: (1) the node you get from
> makeRelabelType is an Expr, but not really a FuncExpr. Casting
> it to FuncExpr is a bit bogus. Also, the third argument to
> makeRelabelType is a typmod, and I believe the "not-modified"
> typmod is -1, not 0.
>
My fault, you are right.
>
> On 2023-08-21 06:50, Andy Fan wrote:
> > I'm not very excited with this manner, reasons are: a). It will have
> > to emit more steps in ExprState->steps which will be harmful for
> > execution. The overhead is something I'm not willing to afford.
>
> I would be open to a performance comparison, but offhand I am not
> sure whether the overhead of another step or two in an ExprState
> is appreciably more than some of the overhead in the present patch,
> such as the every-time-through fcinfo initialization buried in
> DirectFunctionCall1 where you don't necessarily see it. I bet
the fcinfo in an ExprState step gets set up once, and just has
> new argument values slammed into it each time through.
>
fcinfo initialization in DirectFunctionCall1 is an interesting point!
so I am persuaded the extra steps in ExprState may not be
worse than the current way due to the "every-time-through
fcinfo initialization" (in which case the memory is allocated
once in heap rather than every time in stack). I can do a
comparison at last to see if we can find some other interesting
findings.
> I would not underestimate the benefit of reducing the code
> duplication and keeping the patch as clear as possible.
> The key contributions of the patch are getting a numeric or
> boolean efficiently out of the JSON operation. Getting from
> numeric to int or float are things the system already does
> well.
True, reusing the casting system should be better than hard-code
the casting function manually. I'd apply this on both methods.
> A patch that focuses on what it contributes, and avoids
> redoing things the system already can do--unless the duplication
> can be shown to have a strong performance benefit--is easier to
> review and probably to get integrated.
>
Agreed.
At last, thanks for the great insights and patience!
--
Best Regards
Andy Fan
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Smith | 2023-08-30 05:27:55 | Re: [PoC] pg_upgrade: allow to upgrade publisher node |
Previous Message | Amit Kapila | 2023-08-30 04:16:10 | Re: persist logical slots to disk during shutdown checkpoint |