Re: Q: Escapes in jsonpath Idents

From: "David E(dot) Wheeler" <david(at)justatheory(dot)com>
To: Erik Wienhold <ewie(at)ewie(dot)name>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Q: Escapes in jsonpath Idents
Date: 2024-03-19 14:28:06
Message-ID: D1AA2802-98E2-4638-B9B0-512C40AC4B49@justatheory.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mar 17, 2024, at 20:09, Erik Wienhold <ewie(at)ewie(dot)name> wrote:
>
> On 2024-03-17 20:50 +0100, David E. Wheeler wrote:
>> On Mar 17, 2024, at 15:12, Erik Wienhold <ewie(at)ewie(dot)name> wrote:
>>> So I think it makes sense to reword the entire backslash part of the
>>> paragraph and remove references to JSON entirely. The attached patch
>>> does that and also formats the backslash escapes as a bulleted list for
>>> readability.
>>
>> Ah, it’s JavaScript format, not JSON! This does clarify things quite
>> nicely, thank you. Happy to add my review once it’s in a commit fest.
>
> Thanks. https://commitfest.postgresql.org/48/4899/

Applies cleanly, `make -C doc/src/sgml check` runs without error. Doc improvement welcome and much clearer than before.

> I had the same reasoning while writing my first reply but scrapped that
> part because I found it obvious: That jsonpath is parsed before calling
> jsonb_path_exists() and therefore the parser has no context about any
> variables, which might not even be hardcoded but may result from a
> query.

Right, there’s a chicken/egg problem.

> Unfortunately, I don't have access to that part of the SQL spec. So I
> don't know how the jsonpath grammar is specified.

Seems quite logical; I think it should be documented, but I’d also be interested to know what the 2016 and 2023 standards say, exactly.

> Also checked git log src/backend/utils/adt/jsonpath_scan.l for some
> insights but haven't found any yet.

Everybody’s taking shortcuts relative to the standard, AFAICT. For example, jsonpath_scan.l matches unqouted identifiers with these two regular expressions:

<xnq>{other}+
<xnq>\/\*
<xnq,xq,xvq>\\.

Plus the backslash escapes. {other} is defined as:

/* "other" means anything that's not special, blank, or '\' or '"' */
other [^\?\%\$\.\[\]\{\}\(\)\|\&\!\=\<\>\(at)\#\,\*:\-\+\/\\\" \t\n\r\f]

Which is waaaay more liberal than the ECMA standard[1], by my reading, but the MSDN[2] description is quite succinct (thanks for the links!):

> In JavaScript, identifiers are commonly made of alphanumeric characters, underscores (_), and dollar signs ($). Identifiers are not allowed to start with numbers. However, JavaScript identifiers are not only limited to ASCII — many Unicode code points are allowed as well. Namely, any character in the ID_Start category can start an identifier, while any character in the ID_Continue category can appear after the first character.

ID_Start[3] and ID_Continue[4] point to the unicode standard codes lister, nether of which reference Emoji. Sure enough, in Safari:

> x = {"🎉": true}
< {🎉: true}
> x.🎉
< SyntaxError: Invalid character '\ud83c’

But in Postgres jsonpath:

david=# select '$.🎉'::jsonpath;
jsonpath
----------
$."🎉"

If the MSDN references to ID_Start and ID_Continue are correct, then the Postgres path parser is being overly-liberal. Maybe that’s totally fine? Not sure what should be documented and what’s not worth it.

Aside: I’m only digging into these details because I’m busy porting the path parser, so trying to figure out where to be compatible and where not to. So far I’m rejecting '$' (but allowing '\$' and '\u0024') but taking advantage of the unicode support in Go to specifically validate against ID_Start and ID_Continue.

Best,

David

[1] https://262.ecma-international.org/#sec-identifier-names
[2] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Lexical_grammar#identifiers
[3] https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5Cp%7BID_Start%7D
[4] https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5Cp%7BID_Continue%7D

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message stephane tachoires 2024-03-19 14:29:47 Re: Add SPLIT PARTITION/MERGE PARTITIONS commands
Previous Message Japin Li 2024-03-19 14:26:06 Re: Table AM Interface Enhancements