Re: WIP Incremental JSON Parser

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: WIP Incremental JSON Parser
Date: 2024-01-03 15:12:37
Message-ID: CA+TgmoZYB2xDi76w5EnuZHYtpyEEoq_Kt5c0rXPZ2XZ-MYhtxw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 3, 2024 at 9:59 AM Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> Say we have a document with an array 1m objects, each with a field
> called "color". As it stands we'll allocate space for that field name 1m
> times. Using a hash table we'd allocated space for it once. And
> allocating the memory isn't free, although it might be cheaper than
> doing hash lookups.
>
> I guess we can benchmark it and see what the performance impact of using
> a hash table might be.
>
> Another possibility would be simply to have the callback free the field
> name after use. for the parse_manifest code that could be a one-line
> addition to the code at the bottom of json_object_manifest_field_start().

Yeah. So I'm arguing that allocating the memory each time and then
freeing it sounds cheaper than looking it up in the hash table every
time, discovering it's there, and thus skipping the allocate/free.

I might be wrong about that. It's just that allocating and freeing a
small chunk of memory should boil down to popping it off of a linked
list and then pushing it back on. And that sounds cheaper than hashing
the string and looking for it in a hash bucket.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2024-01-03 15:29:54 Re: add AVX2 support to simd.h
Previous Message Robert Haas 2024-01-03 15:10:09 Re: trying again to get incremental backup