From: | Will Storey <will(at)summercat(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>, pgsql-general(at)postgresql(dot)org |
Subject: | Re: Unexpected "canceling statement due to user request" error |
Date: | 2019-09-02 00:21:20 |
Message-ID: | 20190902002120.jp4xjlfuvkwwksr3@dev.null |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Sun 2019-09-01 19:46:19 -0400, Tom Lane wrote:
> I poked at this for awhile and concluded that what you must be seeing is
> that the statement timeout interrupt triggers, but no CHECK_FOR_INTERRUPTS
> call happens thereafter, until we get to the disable_statement_timeout()
> call in finish_xact_command(). That throws away the timeout-happened
> indicator, but it does nothing to reset InterruptPending or
> QueryCancelPending. Those flags will also be thrown away eventually,
> after we've read a new client command:
>
> /*
> * (4) disable async signal conditions again.
> *
> * Query cancel is supposed to be a no-op when there is no query in
> * progress, so if a query cancel arrived while we were idle, just
> * reset QueryCancelPending. ProcessInterrupts() has that effect when
> * it's called when DoingCommandRead is set, so check for interrupts
> * before resetting DoingCommandRead.
> */
> CHECK_FOR_INTERRUPTS();
> DoingCommandRead = false;
>
> Ignoring the timeout if it happened too late is what we're supposed
> to be doing, so that's fine so far.
>
> But: what happens if some other CHECK_FOR_INTERRUPTS occurs before we
> reach the main input loop? Of course, exactly the symptom you saw:
> a bogus "canceling statement due to user request" error report.
> What's worse, that happens after we've completed the transaction, leaving
> the client with a very misleading idea of whether the command committed
> or not.
>
> So how could that happen? After experimenting, I found a pretty
> obvious candidate: statement duration logging happens after we
> do finish_xact_command, and there's a CHECK_FOR_INTERRUPTS in
> errfinish(). So I hypothesize that you have log_duration or
> log_min_duration_statement enabled.
Yes, I have log_min_duration_statement set to 0! (log_duration is off)
> I was able to replicate the problem by lobotomizing a couple of
> CHECK_FOR_INTERRUPTS that would normally happen in a simple
> "select pg_sleep()" query (see attached for-demonstration-
> purposes-only patch). Then I get
>
> regression=# set statement_timeout TO '1s';
> SET
> regression=# select pg_sleep(2.0); -- completes normally after 2 sec
> pg_sleep
> ----------
>
> (1 row)
>
> regression=# set log_min_duration_statement TO '100ms';
> SET
> regression=# select pg_sleep(2.0); -- sits for 2 sec, then:
> ERROR: canceling statement due to user request
>
> So that's not very good and we ought to do something about it.
> We can't have disable_statement_timeout() clear QueryCancelPending,
> because that might lose state about some other cause for query
> cancel. I'm inclined to think that we need to split QueryCancelPending
> into more than one flag.
>
> A separate question is how come the particular query you're complaining
> about has (seemingly) a fairly wide window where it never does any
> CHECK_FOR_INTERRUPTS call before terminating. Perhaps there's someplace
> we need to sprinkle a few more of those.
Yeah, it is strange. There are many queries in this system with statement
timeouts and this is the only one where I see this and it happens several
times a day.
Thank you so much for looking into this. An impressive analysis!
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2019-09-02 00:58:30 | Re: Unexpected "canceling statement due to user request" error |
Previous Message | Tom Lane | 2019-09-01 23:46:19 | Re: Unexpected "canceling statement due to user request" error |