From: | "David E(dot) Wheeler" <david(at)justatheory(dot)com> |
---|---|
To: | Erik Wienhold <ewie(at)ewie(dot)name> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Q: Escapes in jsonpath Idents |
Date: | 2024-03-19 14:28:06 |
Message-ID: | D1AA2802-98E2-4638-B9B0-512C40AC4B49@justatheory.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mar 17, 2024, at 20:09, Erik Wienhold <ewie(at)ewie(dot)name> wrote:
>
> On 2024-03-17 20:50 +0100, David E. Wheeler wrote:
>> On Mar 17, 2024, at 15:12, Erik Wienhold <ewie(at)ewie(dot)name> wrote:
>>> So I think it makes sense to reword the entire backslash part of the
>>> paragraph and remove references to JSON entirely. The attached patch
>>> does that and also formats the backslash escapes as a bulleted list for
>>> readability.
>>
>> Ah, it’s JavaScript format, not JSON! This does clarify things quite
>> nicely, thank you. Happy to add my review once it’s in a commit fest.
>
> Thanks. https://commitfest.postgresql.org/48/4899/
Applies cleanly, `make -C doc/src/sgml check` runs without error. Doc improvement welcome and much clearer than before.
> I had the same reasoning while writing my first reply but scrapped that
> part because I found it obvious: That jsonpath is parsed before calling
> jsonb_path_exists() and therefore the parser has no context about any
> variables, which might not even be hardcoded but may result from a
> query.
Right, there’s a chicken/egg problem.
> Unfortunately, I don't have access to that part of the SQL spec. So I
> don't know how the jsonpath grammar is specified.
Seems quite logical; I think it should be documented, but I’d also be interested to know what the 2016 and 2023 standards say, exactly.
> Also checked git log src/backend/utils/adt/jsonpath_scan.l for some
> insights but haven't found any yet.
Everybody’s taking shortcuts relative to the standard, AFAICT. For example, jsonpath_scan.l matches unqouted identifiers with these two regular expressions:
<xnq>{other}+
<xnq>\/\*
<xnq,xq,xvq>\\.
Plus the backslash escapes. {other} is defined as:
/* "other" means anything that's not special, blank, or '\' or '"' */
other [^\?\%\$\.\[\]\{\}\(\)\|\&\!\=\<\>\(at)\#\,\*:\-\+\/\\\" \t\n\r\f]
Which is waaaay more liberal than the ECMA standard[1], by my reading, but the MSDN[2] description is quite succinct (thanks for the links!):
> In JavaScript, identifiers are commonly made of alphanumeric characters, underscores (_), and dollar signs ($). Identifiers are not allowed to start with numbers. However, JavaScript identifiers are not only limited to ASCII — many Unicode code points are allowed as well. Namely, any character in the ID_Start category can start an identifier, while any character in the ID_Continue category can appear after the first character.
ID_Start[3] and ID_Continue[4] point to the unicode standard codes lister, nether of which reference Emoji. Sure enough, in Safari:
> x = {"🎉": true}
< {🎉: true}
> x.🎉
< SyntaxError: Invalid character '\ud83c’
But in Postgres jsonpath:
david=# select '$.🎉'::jsonpath;
jsonpath
----------
$."🎉"
If the MSDN references to ID_Start and ID_Continue are correct, then the Postgres path parser is being overly-liberal. Maybe that’s totally fine? Not sure what should be documented and what’s not worth it.
Aside: I’m only digging into these details because I’m busy porting the path parser, so trying to figure out where to be compatible and where not to. So far I’m rejecting '$' (but allowing '\$' and '\u0024') but taking advantage of the unicode support in Go to specifically validate against ID_Start and ID_Continue.
Best,
David
[1] https://262.ecma-international.org/#sec-identifier-names
[2] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Lexical_grammar#identifiers
[3] https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5Cp%7BID_Start%7D
[4] https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5Cp%7BID_Continue%7D
From | Date | Subject | |
---|---|---|---|
Next Message | stephane tachoires | 2024-03-19 14:29:47 | Re: Add SPLIT PARTITION/MERGE PARTITIONS commands |
Previous Message | Japin Li | 2024-03-19 14:26:06 | Re: Table AM Interface Enhancements |