Re: WIP Incremental JSON Parser

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Jacob Champion <jacob(dot)champion(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>, Amit Langote <amitlangote09(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: WIP Incremental JSON Parser
Date: 2024-04-09 11:54:25
Message-ID: c89463ab-6ca4-4d7c-abea-791773aeb033@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 2024-04-09 Tu 01:23, Michael Paquier wrote:
> On Tue, Apr 09, 2024 at 09:48:18AM +0900, Michael Paquier wrote:
>> There is no direct check on test_json_parser_perf.c, either, only a
>> custom rule in the Makefile without specifying something for meson.
>> So it looks like you could do short execution check in a TAP test, at
>> least.

Not adding a test for that was deliberate - any sane test takes a while,
and I didn't want to spend that much time on it every time someone runs
"make check-world" or equivalent. However, adding a test to run it with
a trivial number of iterations seems reasonable, so I'll add that. I'll
also add a meson target for the binary.

> While reading the code, I've noticed a few things, as well:
>
> + /* max delicious line length is less than this */
> + char buff[6001];
>
> Delicious applies to the contents, nothing to do with this code even
> if I'd like to think that these lines of code are edible and good.
> Using a hardcoded limit for some JSON input sounds like a bad idea to
> me even if this is a test module. Comment that applies to both the
> perf and the incremental tools. You could use a #define'd buffer size
> for readability rather than assuming this value in many places.

The comment is a remnant of when I hadn't yet added support for
incomplete tokens, and so I had to parse the input line by line. I agree
it can go, and we can use a manifest constant for the buffer size.

>
> +++ b/src/test/modules/test_json_parser/test_json_parser_incremental.c
> + * This progam tests incremental parsing of json. The input is fed into
> + * full range of incement handling, especially in the lexer, is exercised.
> +++ b/src/test/modules/test_json_parser/test_json_parser_perf.c
> + * Performancet est program for both flavors of the JSON parser
> + * This progam tests either the standard (recursive descent) JSON parser
> +++ b/src/test/modules/test_json_parser/README
> + reads in a file and pases it in very small chunks (60 bytes at a time) to
>
> Collection of typos across various files.

Will fix. (The older I get the more typos I seem to make and the harder
it is to notice them. It's very annoying.)

>
> + appendStringInfoString(&json, "1+23 trailing junk");
> What's the purpose here? Perhaps the intention should be documented
> in a comment?

The purpose is to ensure that if there is not a trailing '\0' on the
json chunk the parser will still do the right thing. I'll add a comment
to that effect.

>
> At the end, having a way to generate JSON blobs randomly to test this
> stuff would be more appealing than what you have currently, with
> perhaps a few factors like:
> - Array and simple object density.
> - Max Level of nesting.
> - Limitation to ASCII characters that can be escaped.
> - Perhaps more things I cannot think about?

No, I disagree. Maybe we need those things as well, but we do need a
static test where we can test the output against known results. I have
no objection to changing the input and output files.

It's worth noting that t/002_inline.pl does generate some input and test
e.g., the maximum nesting levels among other errors. Perhaps you missed
that. If you think we need more tests there adding them would be
extremely simple.

>
> So the current state of things is kind of disappointing, and the size
> of the data set added to the tree is not that portable either if you
> want to test various scenarios depending on the data set. It seems to
> me that this has been committed too hastily and that this is not ready
> for integration, even if that's just a test module. Tom also has
> shared some concerns upthread, as far as I can see.

I have posted a patch already that addresses the issue Tom raised.

cheers

andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ranier Vilela 2024-04-09 12:02:55 Re: Flushing large data immediately in pqcomm
Previous Message Ole Peder Brandtzæg 2024-04-09 11:41:15 Re: NumericShort vs NumericLong format