Re: WIP Incremental JSON Parser

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Jacob Champion <jacob(dot)champion(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>, Amit Langote <amitlangote09(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: WIP Incremental JSON Parser
Date: 2024-04-10 11:47:38
Message-ID: 98787c69-7581-456a-b726-fc93416f0b15@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 2024-04-09 Tu 15:42, Andrew Dunstan wrote:
>
> On 2024-04-09 Tu 07:54, Andrew Dunstan wrote:
>>
>>
>> On 2024-04-09 Tu 01:23, Michael Paquier wrote:
>>> On Tue, Apr 09, 2024 at 09:48:18AM +0900, Michael Paquier wrote:
>>>> There is no direct check on test_json_parser_perf.c, either, only a
>>>> custom rule in the Makefile without specifying something for meson.
>>>> So it looks like you could do short execution check in a TAP test, at
>>>> least.
>>
>>
>> Not adding a test for that was deliberate - any sane test takes a
>> while, and I didn't want to spend that much time on it every time
>> someone runs "make check-world" or equivalent. However, adding a test
>> to run it with a trivial number of iterations seems reasonable, so
>> I'll add that. I'll also add a meson target for the binary.
>>
>>
>>> While reading the code, I've noticed a few things, as well:
>>>
>>> +   /* max delicious line length is less than this */
>>> +   char        buff[6001];
>>>
>>> Delicious applies to the contents, nothing to do with this code even
>>> if I'd like to think that these lines of code are edible and good.
>>> Using a hardcoded limit for some JSON input sounds like a bad idea to
>>> me even if this is a test module.  Comment that applies to both the
>>> perf and the incremental tools.  You could use a #define'd buffer size
>>> for readability rather than assuming this value in many places.
>>
>>
>> The comment is a remnant of when I hadn't yet added support for
>> incomplete tokens, and so I had to parse the input line by line. I
>> agree it can go, and we can use a manifest constant for the buffer size.
>>
>>
>>> +++ b/src/test/modules/test_json_parser/test_json_parser_incremental.c
>>> + * This progam tests incremental parsing of json. The input is fed
>>> into
>>> + * full range of incement handling, especially in the lexer, is
>>> exercised.
>>> +++ b/src/test/modules/test_json_parser/test_json_parser_perf.c
>>> + *    Performancet est program for both flavors of the JSON parser
>>> + * This progam tests either the standard (recursive descent) JSON
>>> parser
>>> +++ b/src/test/modules/test_json_parser/README
>>> +  reads in a file and pases it in very small chunks (60 bytes at a
>>> time) to
>>>
>>> Collection of typos across various files.
>>
>>
>> Will fix. (The older I get the more typos I seem to make and the
>> harder it is to notice them. It's very annoying.)
>>
>>
>>> + appendStringInfoString(&json, "1+23 trailing junk");
>>> What's the purpose here?  Perhaps the intention should be documented
>>> in a comment?
>>
>>
>> The purpose is to ensure that if there is not a trailing '\0' on the
>> json chunk the parser will still do the right thing. I'll add a
>> comment to that effect.
>>
>>
>>> At the end, having a way to generate JSON blobs randomly to test this
>>> stuff would be more appealing than what you have currently, with
>>> perhaps a few factors like:
>>> - Array and simple object density.
>>> - Max Level of nesting.
>>> - Limitation to ASCII characters that can be escaped.
>>> - Perhaps more things I cannot think about?
>>
>>
>> No, I disagree. Maybe we need those things as well, but we do need a
>> static test where we can test the output against known results. I
>> have no objection to changing the input and output files.
>>
>> It's worth noting that t/002_inline.pl does generate some input and
>> test e.g., the maximum nesting levels among other errors. Perhaps you
>> missed that. If you think we need more tests there adding them would
>> be extremely simple.
>>
>>
>>> So the current state of things is kind of disappointing, and the size
>>> of the data set added to the tree is not that portable either if you
>>> want to test various scenarios depending on the data set.  It seems to
>>> me that this has been committed too hastily and that this is not ready
>>> for integration, even if that's just a test module.  Tom also has
>>> shared some concerns upthread, as far as I can see.
>>
>>
>> I have posted a patch already that addresses the issue Tom raised.
>>
>>
>
>
> Here's a consolidated set of cleanup patches, including the memory
> leak patch and Jacob's shrink-tiny patch.

Here's v2 of the cleanup patch 4, that fixes some more typos kindly
pointed out to me by Alexander Lakhin.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Attachment Content-Type Size
v2-0004-Assorted-minor-cleanups-in-the-test_json_parser-m.patch text/x-patch 8.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2024-04-10 11:51:34 Comment about handling of asynchronous requests in postgres_fdw.c
Previous Message Richard Guo 2024-04-10 11:32:21 Revise some error messages in split partition code