Re: Patch: Improve Boolean Predicate JSON Path Docs

From: Erik Wienhold <ewie(at)ewie(dot)name>
To: "David E(dot) Wheeler" <david(at)justatheory(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Patch: Improve Boolean Predicate JSON Path Docs
Date: 2023-10-14 23:51:05
Message-ID: qpk36h5raavf66qpkthnynb3hhg3v7kj5xiybj3pe6gvraxqms@6fmeqdglg62q
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2023-10-14 22:40 +0200, David E. Wheeler write:
> Following up from a suggestion from Tom Lane[1] to improve the
> documentation of boolean predicate JSON path expressions, please find
> enclosed a draft patch to do so.

Thanks for putting this together. See my review at the end.

> It does three things:
>
> 1. Converts all of the example path queries to use jsonb_path_query()
> and show the results, to make it clearer what the behaviors are.

Nice. This really does help to make some sense of it. I checked all
queries and they do work out except for two queries where the path
expression string is not properly quoted (but the intended output is
still correct).

> 2. Replaces the list of deviations from the standards with a new
> subsection, with each deviation in its own sub-subsection. The regex
> section is unchanged, but I’ve greatly expanded the boolean expression
> JSON path section with examples comparing standard filter expressions
> and nonstandard boolean predicates. I’ve also added an exhortation not
> use boolean expressions with @? or standard path expressions with @@.

LGTM.

> 3. While converting the modes section to use jsonb_path_query() and
> show the results, I also added an example of strict mode returning an
> error.
>
> Follow-ups I’d like to make:
>
> 1. Expand the modes section to show how the types of results can vary
> depending on the mode, thanks to the flattening. Examples:
>
> david=# select jsonb_path_query('{"a":[1,2,3,4,5]}', '$.a ?(@[*] > 2)');
> jsonb_path_query
> ------------------
> 3
> 4
> 5
> (3 rows)
>
> david=# select jsonb_path_query('{"a":[1,2,3,4,5]}', 'strict $.a ?(@[*] > 2)');
> jsonb_path_query
> ------------------
> [1, 2, 3, 4, 5]
>
> 2. Improve the descriptions and examples for @?/jsonb_path_exists()
> and @@/jsonb_path_match().

+1

> [1] https://www.postgresql.org/message-id/1229727.1680535592%40sss.pgh.pa.us

My review:

> diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
> index affd1254bb..295f8ca5c9 100644
> --- a/doc/src/sgml/func.sgml
> +++ b/doc/src/sgml/func.sgml
> @@ -17205,7 +17205,7 @@ array w/o UK? | t
> For example, suppose you have some JSON data from a GPS tracker that you
> would like to parse, such as:
> <programlisting>
> -{
> + \set json '{

Perhaps make it explicit that the reader must run this in psql in order
to use \set and :'json' in the ensuing samples? Some of the existing
examples already use psql output but they do not rely on any psql
features.

> "track": {
> "segments": [
> {
> @@ -17220,7 +17220,7 @@ array w/o UK? | t
> }
> ]
> }
> -}
> +}'
> </programlisting>
> </para>
>
> @@ -17229,7 +17229,10 @@ array w/o UK? | t
> <literal>.<replaceable>key</replaceable></literal> accessor
> operator to descend through surrounding JSON objects:
> <programlisting>
> -$.track.segments
> +select jsonb_path_query(:'json'::jsonb, '$.track.segments');
> + jsonb_path_query
> +-------------------------------------------------------------------------------------------------------------------------------------------------------------------
> + [{"HR": 73, "location": [47.763, 13.4034], "start time": "2018-10-14 10:05:14"}, {"HR": 135, "location": [47.706, 13.2635], "start time": "2018-10-14 10:39:21"}]
> </programlisting>

This should use <screen>, <userinput>, and <computeroutput> if it shows
a psql session, e.g.:

<screen>
<userinput>select jsonb_path_query(:'json', '$.track.segments');</userinput>
<computeroutput>
jsonb_path_query
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
[{"HR": 73, "location": [47.763, 13.4034], "start time": "2018-10-14 10:05:14"}, {"HR": 135, "location": [47.706, 13.2635], "start time": "2018-10-14 10:39:21"}]
</computeroutput>
</screen>

Also the cast to jsonb is not necessary and only adds clutter IMO.

> </para>
>
> @@ -17239,7 +17242,11 @@ $.track.segments
> the following path will return the location coordinates for all
> the available track segments:
> <programlisting>
> -$.track.segments[*].location
> +select jsonb_path_query(:'json'::jsonb, '$.track.segments[*].location');
> + jsonb_path_query
> +-------------------
> + [47.763, 13.4034]
> + [47.706, 13.2635]
> </programlisting>
> </para>
>
> @@ -17248,7 +17255,10 @@ $.track.segments[*].location
> specify the corresponding subscript in the <literal>[]</literal>
> accessor operator. Recall that JSON array indexes are 0-relative:
> <programlisting>
> -$.track.segments[0].location
> +select jsonb_path_query(:'json'::jsonb, 'strict $.track.segments[0].location');
> + jsonb_path_query
> +-------------------
> + [47.763, 13.4034]
> </programlisting>
> </para>
>
> @@ -17259,7 +17269,10 @@ $.track.segments[0].location
> Each method name must be preceded by a dot. For example,
> you can get the size of an array:
> <programlisting>
> -$.track.segments.size()
> +select jsonb_path_query(:'json'::jsonb, 'strict $.track.segments.size()');
> + jsonb_path_query
> +------------------
> + 2
> </programlisting>
> More examples of using <type>jsonpath</type> operators
> and methods within path expressions appear below in
> @@ -17302,7 +17315,10 @@ $.track.segments.size()
> For example, suppose you would like to retrieve all heart rate values higher
> than 130. You can achieve this using the following expression:
> <programlisting>
> -$.track.segments[*].HR ? (@ &gt; 130)
> +select jsonb_path_query(:'json'::jsonb, '$.track.segments[*].HR ? (@ &gt; 130)');
> + jsonb_path_query
> +------------------
> + 135
> </programlisting>
> </para>
>
> @@ -17312,7 +17328,10 @@ $.track.segments[*].HR ? (@ &gt; 130)
> filter expression is applied to the previous step, and the path used
> in the condition is different:
> <programlisting>
> -$.track.segments[*] ? (@.HR &gt; 130)."start time"
> + select jsonb_path_query(:'json'::jsonb, '$.track.segments[*] ? (@.HR &gt; 130)."start time"');
> + jsonb_path_query
> +-----------------------
> + "2018-10-14 10:39:21"
> </programlisting>
> </para>
>
> @@ -17321,7 +17340,10 @@ $.track.segments[*] ? (@.HR &gt; 130)."start time"
> example, the following expression selects start times of all segments that
> contain locations with relevant coordinates and high heart rate values:
> <programlisting>
> -$.track.segments[*] ? (@.location[1] &lt; 13.4) ? (@.HR &gt; 130)."start time"
> +select jsonb_path_query(:'json'::jsonb, '$.track.segments[*] ? (@.location[1] &lt; 13.4) ? (@.HR &gt; 130)."start time"');
> + jsonb_path_query
> +-----------------------
> + "2018-10-14 10:39:21"
> </programlisting>
> </para>
>
> @@ -17330,46 +17352,81 @@ $.track.segments[*] ? (@.location[1] &lt; 13.4) ? (@.HR &gt; 130)."start time"
> The following example first filters all segments by location, and then
> returns high heart rate values for these segments, if available:
> <programlisting>
> -$.track.segments[*] ? (@.location[1] &lt; 13.4).HR ? (@ &gt; 130)
> +select jsonb_path_query(:'json'::jsonb, $.track.segments[*] ? (@.location[1] &lt; 13.4).HR ? (@ &gt; 130)');

The opening quote is missing from the jsonpath literal.

> + jsonb_path_query
> +------------------
> + 135
> </programlisting>
> </para>
>
> <para>
> You can also nest filter expressions within each other:
> <programlisting>
> -$.track ? (exists(@.segments[*] ? (@.HR &gt; 130))).segments.size()
> +select jsonb_path_query(:'json'::jsonb, $.track ? (exists(@.segments[*] ? (@.HR &gt; 130))).segments.size()');

Missing opening quote here as well.

> + jsonb_path_query
> +------------------
> + 2
> </programlisting>
> This expression returns the size of the track if it contains any
> segments with high heart rate values, or an empty sequence otherwise.
> </para>
>
> - <para>
> - <productname>PostgreSQL</productname>'s implementation of the SQL/JSON path
> - language has the following deviations from the SQL/JSON standard:
> - </para>
> + <sect3 id="devations-from-the-standard">
> + <title>Devaiations from the SQL Standard</title>

Typo in "deviations" (section ID and title).

> + <para>
> + <productname>PostgreSQL</productname>'s implementation of the SQL/JSON path
> + language has the following deviations from the SQL/JSON standard:

The sentence should and in a period when this para is no longer followed
by an item list.

> + </para>
>
> - <itemizedlist>
> - <listitem>
> + <sect4 id="boolean-predicate-path-expressions">
> + <title>Boolean Predicate Path Expressions</title>
> <para>
> - A path expression can be a Boolean predicate, although the SQL/JSON
> - standard allows predicates only in filters. This is necessary for
> - implementation of the <literal>@@</literal> operator. For example,
> - the following <type>jsonpath</type> expression is valid in
> - <productname>PostgreSQL</productname>:
> + As an extension to the SQL standard, a <productname>PostgreSQL</productname>
> + path expression can be a Boolean predicate, whereas the SQL standard allows
> + predicates only in filters. Where SQL standard path expressions return the
> + relevant contents of the queried JSON value, predicate path expressions
> + return the three-value three-valued result of the predicate:

Redundant "three-value" before "three-valued result".

> + <literal>true</literal>, <literal>false</literal>, or
> + <literal>unknown</literal>. Compare this filter <type>jsonpath</type>
> + exression:
> <programlisting>
> -$.track.segments[*].HR &lt; 70
> +select jsonb_path_query(:'json'::jsonb, '$.track.segments ?(@[*].HR &gt; 130)');
> + jsonb_path_query
> +---------------------------------------------------------------------------------
> + {"HR": 135, "location": [47.706, 13.2635], "start time": "2018-10-14 10:39:21"}
> </programlisting>
> - </para>
> - </listitem>
> + To a predicate expression, which returns <literal>true</literal>
> +<programlisting>
> +select jsonb_path_query(:'json'::jsonb, '$.track.segments[*].HR &gt; 130');
> + jsonb_path_query
> +------------------
> + true
> +</programlisting>
> + </para>
>
> - <listitem>
> - <para>
> - There are minor differences in the interpretation of regular
> - expression patterns used in <literal>like_regex</literal> filters, as
> - described in <xref linkend="jsonpath-regular-expressions"/>.
> - </para>
> - </listitem>
> - </itemizedlist>
> + <para>
> + Predicate-only path expressions are necessary for implementation of the
> + <literal>@@</literal> operator (and the
> + <function>jsonb_path_match</function> function), and should not be used
> + with the <literal>@?</literal> operator (or
> + <function>jsonb_path_exists</function> function).
> + </para>
> +
> + <para>
> + Conversely, non-predicate <type>jsonpath</type> expressions should not be
> + used with the <literal>@@</literal> operator (or the
> + <function>jsonb_path_match</function> function).
> + </para>
> + </sect4>

Both paras should be wrapped in a single <note> so that they stand out
from the rest of the text. Maybe even <warning>, but <note> is already
used on this page for things that I'd consider warnings.

> + <sect4 id="jsonpath-regular-expression-deviation">
> + <title>Regular Expression Interpretation</title>
> + <para>
> + There are minor differences in the interpretation of regular
> + expression patterns used in <literal>like_regex</literal> filters, as
> + described in <xref linkend="jsonpath-regular-expressions"/>.
> + </para>
> + </sect4>

<sect3 id="devations-from-the-standard"> should be closed here,
otherwise the docs won't build. This can be checked with
`make -C doc/src/sgml check`.

>
> <sect3 id="strict-and-lax-modes">
> <title>Strict and Lax Modes</title>
> @@ -17431,18 +17488,30 @@ $.track.segments[*].HR &lt; 70
> abstract from the fact that it stores an array of segments
> when using the lax mode:
> <programlisting>
> -lax $.track.segments.location
> + select jsonb_path_query(:'json'::jsonb, 'lax $.track.segments.location');
> + jsonb_path_query

`git diff --check` shows a couple of lines with trailing whitespace
(mostly psql output).

> +-------------------
> + [47.763, 13.4034]
> + [47.706, 13.2635]
> </programlisting>
> </para>
>
> <para>
> - In the strict mode, the specified path must exactly match the structure of
> + In strict mode, the specified path must exactly match the structure of
> the queried JSON document to return an SQL/JSON item, so using this
> - path expression will cause an error. To get the same result as in
> - the lax mode, you have to explicitly unwrap the
> + path expression will cause an error:
> +<programlisting>
> +select jsonb_path_query(:'json'::jsonb, 'strict $.track.segments.location');
> +ERROR: jsonpath member accessor can only be applied to an object
> +</programlisting>
> + To get the same result as in the lax mode, you have to explicitly unwrap the
> <literal>segments</literal> array:
> <programlisting>
> -strict $.track.segments[*].location
> +select jsonb_path_query(:'json'::jsonb, 'strict $.track.segments[*].location');
> + jsonb_path_query
> +-------------------
> + [47.763, 13.4034]
> + [47.706, 13.2635]
> </programlisting>
> </para>
>
> @@ -17451,7 +17520,13 @@ strict $.track.segments[*].location
> when using the lax mode. For instance, the following query selects every
> <literal>HR</literal> value twice:
> <programlisting>
> -lax $.**.HR
> +select jsonb_path_query(:'json'::jsonb, 'lax $.**.HR');
> + jsonb_path_query
> +------------------
> + 73
> + 135
> + 73
> + 135
> </programlisting>
> This happens because the <literal>.**</literal> accessor selects both
> the <literal>segments</literal> array and each of its elements, while
> @@ -17460,7 +17535,11 @@ lax $.**.HR
> the <literal>.**</literal> accessor only in the strict mode. The
> following query selects each <literal>HR</literal> value just once:
> <programlisting>
> -strict $.**.HR
> +select jsonb_path_query(:'json'::jsonb, 'strict $.**.HR');
> + jsonb_path_query
> +------------------
> + 73
> + 135
> </programlisting>
> </para>
>

--
Erik

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2023-10-15 00:18:43 Re: Asymmetric partition-wise JOIN
Previous Message Alexander Korotkov 2023-10-14 22:34:51 Re: POC, WIP: OR-clause support for indexes