From: | Jelte Fennema-Nio <me(at)jeltef(dot)nl> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexander Lakhin <exclusion(at)gmail(dot)com> |
Subject: | Always show correct error message for statement timeouts, fixes random buildfarm failures |
Date: | 2025-02-18 09:29:35 |
Message-ID: | CAGECzQQD5f9ce-D1v6vCTT94hG=JbGdTHuf8vQ=HBLz9T4dA1g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
(was part of "Add non-blocking version of PQcancel" thread[1])
The commitfest app notified me that this patch needed a rebase. While
doing that I'm also moving this patch out of the original thread it
was part of. It fixes a bug that causes this buildfarm failure[2]. In
rare cases the current code would report "canceling statement due to
user request", even though it was actually canceled due to a statement
timeout, so it should have this error message: "canceling statement
due to statement timeout". A way to reproduce this problem locally was
provided by Alexander[3].
I took it out of the original thread because that has become
impossible to follow (at least for me, so I expect also for others)
due to discussing 3 different random failures involving cancel
requests. And this specific one doesn't even have anything to do with
the original non-blocking PQCancel work.
I have inlined the parts of emails that are relevant to this patch
below. That way others don't have to dissect that thread. (for context
this patch is often, but not always, called patch 0002 in the original
thread).
My initial email containing the patch[4]:
On Mon, 24 Jun 2024 at 00:59, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl> wrote:
> On Sat, 22 Jun 2024 at 17:00, Alexander Lakhin <exclusion(at)gmail(dot)com> wrote:
> > I also came across another failure of the test:
> > @@ -2774,7 +2774,7 @@
> > BEGIN;
> > SET LOCAL statement_timeout = '10ms';
> > select count(*) from ft1 CROSS JOIN ft2 CROSS JOIN ft4 CROSS JOIN ft5; -- this takes very long
> > -ERROR: canceling statement due to statement timeout
> > +ERROR: canceling statement due to user request
> > COMMIT;
> >
> > which is reproduced with a sleep added here:
> > @@ -1065,6 +1065,7 @@ exec_simple_query(const char *query_string)
> > */
> > parsetree_list = pg_parse_query(query_string);
> > +pg_usleep(11000);
>
> After investigating, I realized this actually exposes a bug in our
> statement timeout logic. It has nothing to do with posgres_fdw and
> reproduces with any regular postgres query too. Attached is a patch
> that fixes this issue. This one should probably be backported.
Tom's review[5]:
On Fri, 30 Aug 2024 at 22:12, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I think 0002 is probably outright wrong, or at least the change to
> disable_statement_timeout is. Once we get to that, we don't want
> to throw a timeout error any more, even if an interrupt was received
> just before it.
My initial response to that review[6]:
On Fri, 30 Aug 2024 at 23:24, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl> wrote:
> The disable_statement_timeout change was not the part of that patch
> that was necessary for stable output, only the change in the first
> branch of enable_statement_timeout was necessary. The reason being
> that enable_statement_timeout is called multiple times for a query,
> because start_xact_command is called multiple times in
> exec_simple_query. The change to disable_statement_timeout just seemed
> like the logical extension of that change, especially since there was
> basically a verbatim copy of disable_statement_timeout in the second
> branch of enable_statement_timeout.
>
> To make sure I understand your suggestion correctly: Are you saying
> you would want to completely remove the outstanding interrupt if it
> was caused by de statement_timout when disable_statement_timeout is
> called? Because I agree that would probably make sense, but that
> sounds like a more impactful change. But the current behaviour seems
> strictly worse than the behaviour proposed in the patch to me, because
> currently the backend would still be interrupted, but the error would
> indicate a reason for the interrupt that is simply incorrect i.e. it
> will say it was cancelled due to a user request, which never happened.
My second response to that review[7]
On Sat, 31 Aug 2024 at 09:08, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl> wrote:
> I also took a quick look at the code again, and completely removing
> the outstanding interrupt seems hard to do. Because there's no way to
> know if there were multiple causes for the interupt, i.e. someone
> could have pressed ctrl+c as well and we wouldn't want to undo that.
>
> So I think the solution in 0002, while debatable if strictly correct,
> is the only fix that we can easily do. Also I personally believe the
> behaviour resulting from 0002 is totally correct: The new behaviour
> would be that if a timeout occurred, right before it was disabled or
> reset, but the interrupt was not processed yet, then we process that
> timeout as normal. That seems totally reasonable behaviour to me from
> the perspective of an end user: You get a timeout error when the
> timeout occurred before the timeout was disabled/reset.
[1]: https://www.postgresql.org/message-id/flat/AM5PR83MB0178D3B31CA1B6EC4A8ECC42F7529(at)AM5PR83MB0178(dot)EURPRD83(dot)prod(dot)outlook(dot)com
[2]: https://wiki.postgresql.org/wiki/Known_Buildfarm_Test_Failures#timeouts.spec_failed_because_of_statement_cancelled_due_to_unexpected_reason
[3]: https://www.postgresql.org/message-id/4db099c8-4a52-3cc4-e970-14539a319466%40gmail.com#a81203796f7257531d037e2ca6a0ac43
[4]: https://www.postgresql.org/message-id/CAGECzQSBp%3DsCcMCy1txig7cV0VX9hDDb1UdZt3zpDpZe2X2K-Q%40mail.gmail.com#a8d453bddc52f0db0b0fa80676d352c5
[5]: https://www.postgresql.org/message-id/584811.1725048717%40sss.pgh.pa.us#030de114b9a6a3eafd082a92fb95fa6e
[6]: https://www.postgresql.org/message-id/CAGECzQR2HqQTFUddz-Q6y4sfMtTA%2BLRE7WqJtpPSrJKKUXtH7w%40mail.gmail.com#702f7a3fe3936b30bdaad0c5264f5a08
[7]: https://www.postgresql.org/message-id/CAGECzQQM1N0H3e9RbSqi4%3DmmFgfLaFNr6R7gNpFoePb9exRryg%40mail.gmail.com#f11b82da794d5996eca82339c2a17caa
Attachment | Content-Type | Size |
---|---|---|
v5-0001-Do-not-reset-statement_timeout-indicator-outside-.patch | text/x-patch | 2.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Yuki Seino | 2025-02-18 09:33:11 | Re: Add “FOR UPDATE NOWAIT” lock details to the log. |
Previous Message | Amit Kapila | 2025-02-18 09:26:07 | Re: Added schema level support for publication. |