From: | Andrew Dunstan <andrew(at)dunslane(dot)net> |
---|---|
To: | Jacob Champion <champion(dot)p(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com> |
Subject: | Re: WIP Incremental JSON Parser |
Date: | 2024-01-15 15:48:00 |
Message-ID: | 126d90b3-140e-5b99-eefb-b670c493f2d0@dunslane.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2024-01-09 Tu 13:46, Jacob Champion wrote:
> On Tue, Dec 26, 2023 at 8:49 AM Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>> Quite a long time ago Robert asked me about the possibility of an
>> incremental JSON parser. I wrote one, and I've tweaked it a bit, but the
>> performance is significantly worse that that of the current Recursive
>> Descent parser.
> The prediction stack is neat. It seems like the main loop is hit so
> many thousands of times that micro-optimization would be necessary...
> I attached a sample diff to get rid of the strlen calls during
> push_prediction(), which speeds things up a bit (8-15%, depending on
> optimization level) on my machines.
>
> Maybe it's possible to condense some of those productions down, and
> reduce the loop count? E.g. does every "scalar" production need to go
> three times through the loop/stack, or can the scalar semantic action
> just peek at the next token prediction and do all the callback work at
> once?
>
>> + case JSON_SEM_SCALAR_CALL:
>> + {
>> + json_scalar_action sfunc = sem->scalar;
>> +
>> + if (sfunc != NULL)
>> + (*sfunc) (sem->semstate, scalar_val, scalar_tok);
>> + }
> Is it safe to store state (scalar_val/scalar_tok) on the stack, or
> does it disappear if the parser hits an incomplete token?
>
>> One possible use would be in parsing large manifest files for
>> incremental backup.
> I'm keeping an eye on this thread for OAuth, since the clients have to
> parse JSON as well. Those responses tend to be smaller, though, so
> you'd have to really be hurting for resources to need this.
>
I've incorporated your suggestion, and fixed the bug you identified.
The attached also adds support for incrementally parsing backup
manifests, and uses that in the three places we call the manifest parser.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
Attachment | Content-Type | Size |
---|---|---|
v3-0001-Introduce-a-non-recursive-JSON-parser.patch | text/x-patch | 420.1 KB |
v3-0002-Add-support-for-incrementally-parsing-backup-mani.patch | text/x-patch | 7.6 KB |
v3-0003-Use-incremental-parsing-of-backup-manifests.patch | text/x-patch | 11.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Kirk Wolak | 2024-01-15 15:49:19 | Re: Oom on temp (un-analyzed table caused by JIT) V16.1 |
Previous Message | Alvaro Herrera | 2024-01-15 15:37:49 | minor replication slot docs edits |