From: | Nikita Glukhov <n(dot)gluhov(at)postgrespro(dot)ru> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Cc: | Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru> |
Subject: | Re: Fix parsing of identifiers in jsonpath |
Date: | 2019-10-02 13:10:18 |
Message-ID: | f6b0228f-71c4-2d21-68c0-dcfa110d18ed@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Attached v2 patch rebased onto current master.
On 18.09.2019 18:10, Nikita Glukhov wrote:
> Unfortunately, jsonpath lexer, in contrast to jsonpath parser, was written by
> Teodor and me without a proper attention to the stanard. JSON path lexics is
> is borrowed from the external ECMAScript [1], and we did not study it carefully.
>
> There were numerous deviations from the ECMAScript standard in our jsonpath
> implementation that were mostly fixed in the attached patch:
>
> 1. Identifiers (unquoted JSON key names) should start from the one of (see [2]):
> - Unicode symbol having Unicode property "ID_Start" (see [3])
> - Unicode escape sequence '\uXXXX' or '\u{X...}'
> - '$'
> - '_'
>
> And they should continue with the one of:
> - Unicode symbol having Unicode property "ID_Continue" (see [3])
> - Unicode escape sequence
> - '$'
> - ZWNJ
> - ZWJ
>
> 2. '$' is also allowed inside the identifiers, so it is possible to write
> something like '$.a$$b'.
>
> 3. Variable references '$var' are regular identifiers simply starting from the
> '$' sign, and there is no syntax like '$"var"', because quotes are not
> allowed in identifiers.
>
> 4. Even if the Unicode escape sequence '\uXXXX' is used, it cannot produce
> special symbols or whitespace, because the identifiers are displayed without
> quoting (i.e. '$\u{20}' is not possible to display as '$" "' or even more as
> string '"$ "').
>
> 5. All codepoints in '\u{XXXXXX}' greater than 0x10FFFF should be forbidden.
>
> 6. 6 single-character escape sequences (\b \t \r \f \n \v) should only be
> supported inside quoted strings.
>
>
> I don't know if it is possible to check Unicode properties "ID_Start" and
> "ID_Continue" in Postgres, and what ZWNJ/ZWJ is. Now, identifier's starting
> character set is simply determined by the exclusion of all recognized special
> characters.
>
>
> The patch is not so simple, but I believe that it's not too late to fix v12.
>
>
> [1]https://www.ecma-international.org/ecma-262/10.0/index.html#sec-ecmascript-language-lexical-grammar
> [2]https://www.ecma-international.org/ecma-262/10.0/index.html#sec-names-and-keywords
> [3]https://unicode.org/reports/tr31/
--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachment | Content-Type | Size |
---|---|---|
0001-Fix-parsing-of-identifiers-in-jsonpath-v02.patch | text/x-patch | 16.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Masahiko Sawada | 2019-10-02 13:58:26 | Re: [HACKERS] Block level parallel vacuum |
Previous Message | vignesh C | 2019-10-02 09:26:52 | Ordering of header file inclusion |