Re: WIP Incremental JSON Parser

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Jacob Champion <jacob(dot)champion(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: WIP Incremental JSON Parser
Date: 2024-04-02 21:12:51
Message-ID: 8447d168-b981-2601-8ad0-53827fe18e5a@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 2024-04-02 Tu 15:38, Jacob Champion wrote:
> On Mon, Apr 1, 2024 at 4:53 PM Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>> Anyway, here are new patches. I've rolled the new semantic test into the
>> first patch.
> Looks good! I've marked RfC.

Thanks! I appreciate all the work you've done on this. I will give it
one more pass and commit RSN.

>
>> json_lex() is not really a very hot piece of code.
> Sure, but I figure if someone is trying to get the performance of the
> incremental parser to match the recursive one, so we can eventually
> replace it, it might get a little warmer. :)

I don't think this is where the performance penalty lies. Rather, I
suspect it's the stack operations in the non-recursive parser itself.
The speed test doesn't involve any partial token processing at all, and
yet the non-recursive parser is significantly slower in that test.

>>> I think it'd be good for a v1.x of this feature to focus on
>>> simplification of the code, and hopefully consolidate and/or eliminate
>>> some of the duplicated parsing work so that the mental model isn't
>>> quite so big.
>> I'm not sure how you think that can be done.
> I think we'd need to teach the lower levels of the lexer about
> incremental parsing too, so that we don't have two separate sources of
> truth about what ends a token. Bonus points if we could keep the parse
> state across chunks to the extent that we didn't need to restart at
> the beginning of the token every time. (Our current tools for this are
> kind of poor, like the restartable state machine in PQconnectPoll.
> While I'm dreaming, I'd like coroutines.) Now, whether the end result
> would be more or less maintainable is left as an exercise...
>

I tried to disturb the main lexer processing as little as possible. We
could possibly unify the two paths, but I have a strong suspicion that
would result in a performance hit (the main part of the lexer doesn't
copy any characters at all, it just keeps track of pointers into the
input). And while the code might not undergo lots of change, the routine
itself is quite performance critical.

Anyway, I think that's all something for another day.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-04-02 21:13:20 Re: Fix out-of-bounds in the function PQescapeinternal (src/interfaces/libpq/fe-exec.c)
Previous Message Daniel Verite 2024-04-02 20:54:47 Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs