Quick Links

Re: WIP Incremental JSON Parser

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Jacob Champion <jacob(dot)champion(at)enterprisedb(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: WIP Incremental JSON Parser
Date:	2024-03-19 22:07:21
Message-ID:	CAD5tBcLi2ffZkktV2qrsKSBykE-N8CiYgrfbv0vZ-F7=xLFeqw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Mon, Mar 18, 2024 at 3:35 PM Jacob Champion <
jacob(dot)champion(at)enterprisedb(dot)com> wrote:

> On Mon, Mar 18, 2024 at 3:32 AM Andrew Dunstan <andrew(at)dunslane(dot)net>
> wrote:
> > Not very easily. But I think and hope I've fixed the issue you've
> identified above about returning before lex->token_start is properly set.
> >
> > Attached is a new set of patches that does that and is updated for the
> json_errdetaiil() changes.
>
> Thanks!
>
> > ++ * Normally token_start would be ptok->data, but it could
> be later,
> > ++ * see json_lex_string's handling of invalid escapes.
> > + */
> > -+ lex->token_start = ptok->data;
> > ++ lex->token_start = dummy_lex.token_start;
> > + lex->token_terminator = ptok->data + ptok->len;
>
> By the same token (ha), the lex->token_terminator needs to be updated
> from dummy_lex for some error paths. (IIUC, on success, the
> token_terminator should always point to the end of the buffer. If it's
> not possible to combine the two code paths, maybe it'd be good to
> check that and assert/error out if we've incorrectly pulled additional
> data into the partial token.)
>

Yes, good point. Will take a look at that.

>
> With the incremental parser, I think prev_token_terminator is not
> likely to be safe to use except in very specific circumstances, since
> it could be pointing into a stale chunk. Some documentation around how
> to use that safely in a semantic action would be good.
>

Quite right. It's not safe. Should we ensure it's set to something like
NULL or -1?

Also, where do you think we should put a warning about it?

>
> It looks like some of the newly added error handling paths cannot be
> hit, because the production stack makes it logically impossible to get
> there. (For example, if it takes a successfully lexed comma to
> transition into JSON_PROD_MORE_ARRAY_ELEMENTS to begin with, then when
> we pull that production's JSON_TOKEN_COMMA off the stack, we can't
> somehow fail to match that same comma.) Assuming I haven't missed a
> different way to get into that situation, could the "impossible" cases
> have assert calls added?
>

Good idea.

>
> I've attached two diffs. One is the group of tests I've been using
> locally (called 002_inline.pl; I replaced the existing inline tests
> with it), and the other is a set of potential fixes to get those tests
> green.
>
>
>

Thanks. Here's a patch set that incorporates your two patches.

It also removes the frontend exits I had. In the case of stack depth, we
follow the example of the RD parser and only check stack depth for backend
code. In the case of the check that the lexer is set up for incremental
parsing, the exit is replaced by an Assert. That means your test for an
over-nested array doesn't work any more, so I have commented it out.

cheers

andrew

Attachment	Content-Type	Size
v12-0001-Introduce-a-non-recursive-JSON-parser.patch	text/x-patch	427.7 KB
v12-0002-Add-support-for-incrementally-parsing-backup-man.patch	text/x-patch	7.6 KB
v12-0003-Use-incremental-parsing-of-backup-manifests.patch	text/x-patch	11.9 KB

In response to

Re: WIP Incremental JSON Parser at 2024-03-18 19:34:51 from Jacob Champion

Responses

Re: WIP Incremental JSON Parser at 2024-03-20 07:19:39 from Andrew Dunstan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2024-03-19 22:20:59	Re: Improving EXPLAIN's display of SubPlan nodes
Previous Message	Dean Rasheed	2024-03-19 21:49:07	Re: Improving EXPLAIN's display of SubPlan nodes