From: | James Coleman <jtc331(at)gmail(dot)com> |
---|---|
To: | torikoshia <torikoshia(at)oss(dot)nttdata(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Greg Stark <stark(at)mit(dot)edu>, Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>, david(dot)christensen(at)crunchydata(dot)com, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
Subject: | Re: RFC: Logging plan of the running query |
Date: | 2023-06-12 15:52:01 |
Message-ID: | CAAaqYe9u-ywvRX32iBj1SkQrLf1hqMhR9HFCc0dAk4rGDJpyFw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, Jun 11, 2023 at 11:07 PM torikoshia <torikoshia(at)oss(dot)nttdata(dot)com> wrote:
>
> On 2023-06-06 03:26, James Coleman wrote:
> > On Mon, Jun 5, 2023 at 4:30 AM torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>
> > wrote:
> >>
> >> On 2023-06-03 02:51, James Coleman wrote:
> >> > Hello,
> >> >
> >> > Thanks for working on this patch!
> >
> > Sure thing! I'm *very interested* in seeing this available, and I
> > think it paves the way for some additional features later on...
> >
> >> > On Thu, Dec 8, 2022 at 12:10 AM torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>
> >> ...
> >> > To put it positively: we believe that, for example, catalog accesses
> >> > inside CHECK_FOR_INTERRUPTS() -- assuming that the CFI call is inside
> >> > an existing valid transaction/query state, as it would be for this
> >> > patch -- are safe. If there were problems, then those problems are
> >> > likely bugs we already have in other CFI cases.
> >>
> >> Thanks a lot for the discussion and sharing it!
> >> I really appreciate it.
> >>
> >> BTW I'm not sure whether all the CFI are called in valid transaction,
> >> do you think we should check each of them?
> >
> > I kicked off the regressions tests with a call to
> > ProcessLogQueryPlanInterrupt() in every single CHECK_FOR_INTERRUPTS()
> > call. Several hours and 52 GB of logs later I have confirmed that
> > (with the attached revision) at the very least the regression test
> > suite can't trigger any kind of failures regardless of when we trigger
> > this. The existing code in the patch for only running the explain when
> > there's an active query handling that.
>
> Thanks for the testing!
>
> > I've attached v27. The important change here in 0001 is that it
> > guarantees the interrupt handler is re-entrant, since that was a bug
> > exposed by my testing. I've also included 0002 which is only meant for
> > testing (it attempts to log in the plan in every
> > CHECK_FOR_INTERRUPTS() call).
>
> When SIGINT is sent during ProcessLogQueryPlanInterrupt(),
> ProcessLogQueryPlanInterruptActive can remain true.
> After that, pg_log_query_plan() does nothing but just returns.
>
> AFAIU, v26 logs plan for each pg_log_query_plan() even when another
> pg_log_query_plan() is running, but it doesn't cause errors or critical
> problem.
>
> Considering the problem solved and introduced by v27, I think v26 might
> be fine.
> How do you think?
The testing I did with calling this during every CFI is what uncovered
the re-entrancy problem. IIRC (without running that test again) the
problem was a stack overflow. Now, to be sure this is a particularly
degenerate case because in real-world usage it'd be impossible in
practice, I think, to trigger that many calls to this function (and by
extension the interrupt handler).
If SIGINT is the only concern we could reset
ProcessLogQueryPlanInterruptActive in error handling code. I admit
that part of my thought process here is thinking ahead to an
additional patch I'd like to see on top of this, which is logging a
query plan before cleaning up when statement timeout occurs. The
re-entrancy issue becomes more interesting then, I think, since we
would then have automated calling of the logging code. BTW: I'd
thought that would make a nice follow-up patch for this, but if you'd
prefer I could add it as another patch in the series here.
What do you think about resetting the flag versus just not having it?
Regards,
James Coleman
From | Date | Subject | |
---|---|---|---|
Next Message | Juan José Santamaría Flecha | 2023-06-12 16:12:36 | Re: Inconsistent results with libc sorting on Windows |
Previous Message | Joel Jacobson | 2023-06-12 15:43:28 | Re: Do we want a hashset type? |