From: | Alexander Lakhin <exclusion(at)gmail(dot)com> |
---|---|
To: | Noah Misch <noah(at)leadboat(dot)com>, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jacob Champion <jacob(dot)champion(at)enterprisedb(dot)com>, Denis Laxalde <denis(dot)laxalde(at)dalibo(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Jelte Fennema <Jelte(dot)Fennema(at)microsoft(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com> |
Subject: | Re: [EXTERNAL] Re: Add non-blocking version of PQcancel |
Date: | 2024-06-22 15:00:01 |
Message-ID: | 4db099c8-4a52-3cc4-e970-14539a319466@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello hackers,
30.03.2024 01:17, Noah Misch wrote:
> On Fri, Mar 29, 2024 at 09:17:55AM +0100, Jelte Fennema-Nio wrote:
>> On Thu, 28 Mar 2024 at 19:03, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Could we make this test bulletproof by using an injection point?
>>> If not, I remain of the opinion that we're better off without it.
>> Possibly, and if so, I agree that would be better than the currently
>> added test. But I honestly don't feel like spending the time on
>> creating such a test.
> The SQL test is more representative of real applications, and it's way simpler
> to understand. In general, I prefer 6-line SQL tests that catch a problem 10%
> of the time over injection point tests that catch it 100% of the time. For
> low detection rate to be exciting, it needs to be low enough to have a serious
> chance of all buildfarm members reporting green for the bad commit. With ~115
> buildfarm members running in the last day, 0.1% detection rate would have been
> low enough to bother improving, but 4% would be high enough to call it good.
As a recent buildfarm failure on orlingo (which tests asan-enabled builds)
[1] shows, that test can still fail:
70/70 postgresql:postgres_fdw-running / postgres_fdw-running/regress ERROR 278.67s exit status 1
@@ -2775,6 +2775,7 @@
SET LOCAL statement_timeout = '10ms';
select count(*) from ft1 CROSS JOIN ft2 CROSS JOIN ft4 CROSS JOIN ft5; -- this takes very long
ERROR: canceling statement due to statement timeout
+WARNING: could not get result of cancel request due to timeout
COMMIT;
(from the next run we can see normal duration:
"postgres_fdw-running/regress OK 6.30s ")
I reproduced the failure with an asan-enabled build on a slowed-down VM
and as far as I can see, it's caused by the following condition in
ProcessInterrupts():
/*
* If we are reading a command from the client, just ignore the cancel
* request --- sending an extra error message won't accomplish
* anything. Otherwise, go ahead and throw the error.
*/
if (!DoingCommandRead)
{
LockErrorCleanup();
ereport(ERROR,
(errcode(ERRCODE_QUERY_CANCELED),
errmsg("canceling statement due to user request")));
}
I think this failure can be reproduced easily (without asan/slowing down)
with this modification:
@@ -4630,6 +4630,7 @@ PostgresMain(const char *dbname, const char *username)
idle_session_timeout_enabled = false;
}
+if (rand() % 10 == 0) pg_usleep(10000);
/*
* (5) disable async signal conditions again.
*
Running this test in a loop (for ((i=1;i<=100;i++)); do \
echo "iteration $i"; make -s check -C contrib/postgres_fdw/ || break; \
done), I get:
...
iteration 56
# +++ regress check in contrib/postgres_fdw +++
# initializing database system by copying initdb template
# using temp instance on port 55312 with PID 991332
ok 1 - postgres_fdw 20093 ms
1..1
# All 1 tests passed.
iteration 57
# +++ regress check in contrib/postgres_fdw +++
# initializing database system by copying initdb template
# using temp instance on port 55312 with PID 992152
not ok 1 - postgres_fdw 62064 ms
1..1
...
--- .../contrib/postgres_fdw/expected/postgres_fdw.out 2024-06-22 02:52:42.991574907 +0000
+++ .../contrib/postgres_fdw/results/postgres_fdw.out 2024-06-22 14:43:43.949552927 +0000
@@ -2775,6 +2775,7 @@
SET LOCAL statement_timeout = '10ms';
select count(*) from ft1 CROSS JOIN ft2 CROSS JOIN ft4 CROSS JOIN ft5; -- this takes very long
ERROR: canceling statement due to statement timeout
+WARNING: could not get result of cancel request due to timeout
COMMIT;
I also came across another failure of the test:
@@ -2774,7 +2774,7 @@
BEGIN;
SET LOCAL statement_timeout = '10ms';
select count(*) from ft1 CROSS JOIN ft2 CROSS JOIN ft4 CROSS JOIN ft5; -- this takes very long
-ERROR: canceling statement due to statement timeout
+ERROR: canceling statement due to user request
COMMIT;
which is reproduced with a sleep added here:
@@ -1065,6 +1065,7 @@ exec_simple_query(const char *query_string)
*/
parsetree_list = pg_parse_query(query_string);
+pg_usleep(11000);
[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=olingo&dt=2024-06-20%2009%3A52%3A04
Best regards,
Alexander
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2024-06-22 15:04:11 | Re: replace strtok() |
Previous Message | Peter Geoghegan | 2024-06-22 14:53:31 | Re: FreezeLimit underflows in pg14 and 15 causing incorrect behavior in heap_prepare_freeze_tuple |