Re: pg_parse_json() should not leak token copies on failure

From: Jacob Champion <jacob(dot)champion(at)enterprisedb(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_parse_json() should not leak token copies on failure
Date: 2024-06-04 17:10:06
Message-ID: CAOYmi+=toTcJ7BS+=74-7=2GARU4sPW_JRxZzp=ur=i9sV2jUQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jun 3, 2024 at 9:32 PM Kyotaro Horiguchi
<horikyota(dot)ntt(at)gmail(dot)com> wrote:
>
> Hi,

Thanks for the review!

> I understand that the part enclosed in parentheses refers to the "if
> (ptr) pfree(ptr)" structure. I believe we rely on the behavior of
> free(NULL), which safely does nothing. (I couldn't find the related
> discussion due to a timeout error on the ML search page.)

For the frontend, right. For the backend, pfree(NULL) causes a crash.
I think [1] is a related discussion on the list, maybe the one you
were looking for?

> Although I don't fully understand the entire parser code, it seems
> that the owner transfer only occurs in JSON_TOKEN_STRING cases. That
> is, the memory pointed by scalar_val would become dangling in
> JSON_TOKEN_NUBMER cases.

It should happen for both cases, I think. I see that the
JSON_SEM_SCALAR_CALL case passes scalar_val into the callback
regardless of whether we're parsing a string or a number, and
parse_scalar() does the same thing over in the recursive
implementation. Have I missed a code path?

> Even if this is not the case, the ownership
> transition apperas quite callenging to follow.

I agree!

> It might be safer or clearer to pstrdup the token in jsonb_in_scalar()
> and avoid NULLifying scalar_val after calling callbacks, or to let
> jsonb_in_sclar() NULLify the pointer.

Maybe. jsonb_in_scalar isn't the only scalar callback implementation,
though. It might be a fairly cruel change for dependent extensions,
since there will be no compiler complaint.

If a compatibility break is acceptable, maybe we should make the API
zero-copy for the recursive case, and just pass the length of the
parsed token? Either way, we'd have to change the object field
callbacks, too, but maybe an API change makes even more sense there,
given how convoluted it is right now.

Thanks,
--Jacob

[1] https://www.postgresql.org/message-id/flat/cf26e970-8e92-59f1-247a-aa265235075b%40enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2024-06-04 17:12:04 Re: Partial aggregates pushdown
Previous Message Peter Eisentraut 2024-06-04 16:56:34 Re: meson "experimental"?