Re: WIP Incremental JSON Parser

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Jacob Champion <jacob(dot)champion(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>, Amit Langote <amitlangote09(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: WIP Incremental JSON Parser
Date: 2024-04-09 19:42:28
Message-ID: 1a4d8a4c-ca76-4d58-8d8e-608ad8c6e263@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 2024-04-09 Tu 07:54, Andrew Dunstan wrote:
>
>
> On 2024-04-09 Tu 01:23, Michael Paquier wrote:
>> On Tue, Apr 09, 2024 at 09:48:18AM +0900, Michael Paquier wrote:
>>> There is no direct check on test_json_parser_perf.c, either, only a
>>> custom rule in the Makefile without specifying something for meson.
>>> So it looks like you could do short execution check in a TAP test, at
>>> least.
>
>
> Not adding a test for that was deliberate - any sane test takes a
> while, and I didn't want to spend that much time on it every time
> someone runs "make check-world" or equivalent. However, adding a test
> to run it with a trivial number of iterations seems reasonable, so
> I'll add that. I'll also add a meson target for the binary.
>
>
>> While reading the code, I've noticed a few things, as well:
>>
>> + /* max delicious line length is less than this */
>> + char buff[6001];
>>
>> Delicious applies to the contents, nothing to do with this code even
>> if I'd like to think that these lines of code are edible and good.
>> Using a hardcoded limit for some JSON input sounds like a bad idea to
>> me even if this is a test module. Comment that applies to both the
>> perf and the incremental tools. You could use a #define'd buffer size
>> for readability rather than assuming this value in many places.
>
>
> The comment is a remnant of when I hadn't yet added support for
> incomplete tokens, and so I had to parse the input line by line. I
> agree it can go, and we can use a manifest constant for the buffer size.
>
>
>> +++ b/src/test/modules/test_json_parser/test_json_parser_incremental.c
>> + * This progam tests incremental parsing of json. The input is fed into
>> + * full range of incement handling, especially in the lexer, is exercised.
>> +++ b/src/test/modules/test_json_parser/test_json_parser_perf.c
>> + * Performancet est program for both flavors of the JSON parser
>> + * This progam tests either the standard (recursive descent) JSON parser
>> +++ b/src/test/modules/test_json_parser/README
>> + reads in a file and pases it in very small chunks (60 bytes at a time) to
>>
>> Collection of typos across various files.
>
>
> Will fix. (The older I get the more typos I seem to make and the
> harder it is to notice them. It's very annoying.)
>
>
>> + appendStringInfoString(&json, "1+23 trailing junk");
>> What's the purpose here? Perhaps the intention should be documented
>> in a comment?
>
>
> The purpose is to ensure that if there is not a trailing '\0' on the
> json chunk the parser will still do the right thing. I'll add a
> comment to that effect.
>
>
>> At the end, having a way to generate JSON blobs randomly to test this
>> stuff would be more appealing than what you have currently, with
>> perhaps a few factors like:
>> - Array and simple object density.
>> - Max Level of nesting.
>> - Limitation to ASCII characters that can be escaped.
>> - Perhaps more things I cannot think about?
>
>
> No, I disagree. Maybe we need those things as well, but we do need a
> static test where we can test the output against known results. I have
> no objection to changing the input and output files.
>
> It's worth noting that t/002_inline.pl does generate some input and
> test e.g., the maximum nesting levels among other errors. Perhaps you
> missed that. If you think we need more tests there adding them would
> be extremely simple.
>
>
>> So the current state of things is kind of disappointing, and the size
>> of the data set added to the tree is not that portable either if you
>> want to test various scenarios depending on the data set. It seems to
>> me that this has been committed too hastily and that this is not ready
>> for integration, even if that's just a test module. Tom also has
>> shared some concerns upthread, as far as I can see.
>
>
> I have posted a patch already that addresses the issue Tom raised.
>
>

Here's a consolidated set of cleanup patches, including the memory leak
patch and Jacob's shrink-tiny patch.

I think the biggest open issue is whether or not we remove the
performance test program.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Attachment Content-Type Size
0001-Fix-some-memory-leaks-associated-with-parsing-json-a.patch.gz application/gzip 1.7 KB
0002-Add-a-TAP-test-for-test_json_parser_perf.patch.gz application/gzip 962 bytes
0003-Shrink-test-file-for-test_json_parser-module.patch.gz application/gzip 111.5 KB
0004-Assorted-minor-cleanups-in-the-test_json_parser-modu.patch.gz application/gzip 2.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-04-09 19:44:31 Re: macOS Ventura won't generate core dumps
Previous Message Jeff Davis 2024-04-09 19:33:04 Re: simplehash.h: "SH_SCOPE static" causes warnings