Re: Extract numeric filed in JSONB more effectively

From: Chapman Flack <chap(at)anastigmatix(dot)net>
To: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
Cc: jian he <jian(dot)universality(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Extract numeric filed in JSONB more effectively
Date: 2023-08-21 03:19:47
Message-ID: 5138c6b5fd239e7ce4e1a4e63826ac27@anastigmatix.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2023-08-20 21:31, Andy Fan wrote:
> Highlighting the user case of makeRelableType is interesting! But using
> the Oid directly looks more promising for this question IMO, it looks
> like:
> "you said we can put anything in this arg, so I put an OID const
> here",
> seems nothing is wrong.

Perhaps one of the more senior developers will chime in, but to me,
leaving out the relabel nodes looks more like "all of PostgreSQL's
type checking happened before the SupportRequestSimplify, so nothing
has noticed that we rewrote the tree with mismatched types, and as
long as nothing crashes we sort of got away with it."

Suppose somebody writes an extension to double-check that plan
trees are correctly typed. Or improves EXPLAIN to check a little more
carefully than it seems to. Omitting the relabel nodes could spell
trouble then.

Or, someone more familiar with the code than I am might say "oh,
mismatches like that are common in rewritten trees, we live with it."
But unless somebody tells me that, I'm not believing it.

But I would say we have proved the concept of SupportRequestSimplify
for this task. :)

Now, it would make me happy to further reduce some of the code
duplication between the original and the _type versions of these
functions. I see that you noticed the duplication in the case of
jsonb_extract_path, and you factored out jsonb_get_jsonbvalue so
it could be reused. There is also some duplication with object_field
and array_element. (Also, we may have overlooked jsonb_path_query
and jsonb_path_query_first as candidates for the source of the
cast. Two more candidates; five total.)

Here is one way this could be structured. Observe that every one
of those five source candidates operates in two stages:

Start: All of the processing until a JsonbValue has been obtained.
Finish: Converting the JsonbValue to some form for return.

Before this patch, there were two choices for Finish:
JsonbValueToJsonb or JsonbValueAsText.

With this patch, there are four Finish choices: those two, plus
PG_RETURN_BOOL(v->val.boolean), PG_RETURN_NUMERIC(v->val.numeric).

Clearly, with rewriting, we can avoid 5✕4 = 20 distinct
functions. The five candidate functions only differ in Start.
Suppose each of those had a _start version, like
jsonb_object_field_start, that only proceeds as far as
obtaining the JsonbValue, and returns that directly (an
'internal' return type). Naturally, each _start function would
need an 'internal' parameter also, even if it isn't used,
just to make sure it is not SQL-callable.

Now consider four Finish functions: jsonb_finish_jsonb,
jsonb_finish_text, jsonb_finish_boolean, jsonb_finish_numeric.

Each would have one 'internal' parameter (a JsonbValue), and
its return type declared normally. There is no need to pass
a type oid to any of these, and they need not contain any
switch to select a return type. The correct finisher to use
is simply chosen once at the time of rewriting.

So cast(jsonb_array_element(jb, 0) as numeric) would just get
rewritten as jsonb_finish_numeric(jsonb_array_element_start(jb,0)).

The other (int and float) types don't need new code; just have
the rewriter add a cast-from-numeric node on top. That's all
those other switch cases in cast_jsonbvalue_to_type are doing,
anyway.

Notice in this structure, less relabeling is needed. The
final return does not need relabeling, because each finish
function has the expected return type. Each finish function's
parameter is typed 'internal' (a JsonbValue), but that's just
what each start function returns, so no relabeling needed
there either.

The rewriter will have to supply some 'internal' constant
as a start-function parameter (because of the necessary
'internal' parameter). It might still be civilized to relabel
that.

Regards,
-Chap

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2023-08-21 03:20:32 Re: [PoC] pg_upgrade: allow to upgrade publisher node
Previous Message Kyotaro Horiguchi 2023-08-21 02:33:28 Re: Fix pg_stat_reset_single_table_counters function