Re: Improve pg_re_throw: check if sigjmp_buf is valid and report error

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Xiaoran Wang <fanfuxiaoran(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Improve pg_re_throw: check if sigjmp_buf is valid and report error
Date: 2024-08-19 14:12:43
Message-ID: 1361641.1724076763@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Mon, Aug 19, 2024 at 2:17 AM Xiaoran Wang <fanfuxiaoran(at)gmail(dot)com> wrote:
>> If the code in PG_TRY contains any non local control flow other than
>> ereport(ERROR) like goto, break etc., the PG_CATCH or PG_END_TRY cannot
>> be called, then the PG_exception_stack will point to the memory whose
>> stack frame has been released. So after that, when the pg_re_throw
>> called, __longjmp() will crash and report Segmentation fault error.
>>
>> Addition to sigjmp_buf, add another field 'int magic' which is next to
>> the sigjum_buf in the local stack frame memory. The magic's value is always
>> 'PG_exception_magic 0x12345678'. And in 'pg_re_throw' routine, check if
>> the magic's value is still '0x12345678', if not, that means the memory
>> where the 'PG_exception_stack' points to has been released, and the 'sigbuf'
>> must be invalid.

> This is an interesting idea. I suspect if we do this we want a
> different magic number and a different error message than what you
> chose here, but those are minor details.

I would suggest just adding an Assert; I doubt this check would be
useful in production.

> I'm not sure how reliable this would be at actually finding problems,
> though.

Yeah, that's the big problem. I don't have any confidence at all
that this would detect misuse. It'd require that the old stack
frame gets overwritten, which might not happen for a long time,
and it'd require that somebody eventually do a longjmp, which again
might not happen for a long time --- and when those did happen, the
error would be detected in someplace far away from the actual bug,
with little evidence remaining to help you localize it.

Also, as soon as some outer level of PG_TRY is exited in the proper
way, the dangling stack pointer will be fixed up. That means there's
a fairly narrow time frame in which the overwrite and longjmp must
happen for this to catch a bug.

So on the whole I doubt this'd be terribly useful in this form;
and I don't like the amount of code churn required.

> Personally, I don't think I've ever made this particular mistake. I
> think a much more common usage error is exiting the catch-block
> without either throwing an error or rolling back a subtransaction. But
> YMMV, of course.

We have had multiple instances of code "return"ing out of a PG_TRY,
so I fully agree that some better way to detect that would be good.
But maybe we ought to think about static analysis for that.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bertrand Drouvot 2024-08-19 14:13:37 Re: define PG_REPLSLOT_DIR
Previous Message Bertrand Drouvot 2024-08-19 14:11:55 Re: define PG_REPLSLOT_DIR