From: | Alexander Lakhin <exclusion(at)gmail(dot)com> |
---|---|
To: | Melanie Plageman <melanieplageman(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Cc: | Jeff Davis <pgsql(at)j-davis(dot)com>, Andres Freund <andres(at)anarazel(dot)de> |
Subject: | Re: stats test intermittent failure |
Date: | 2023-07-11 18:00:00 |
Message-ID: | 8d526f29-c269-dc3d-38ac-10fa3a0a7fb3@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Melanie,
10.07.2023 21:35, Melanie Plageman wrote:
> Hi,
>
> Jeff pointed out that one of the pg_stat_io tests has failed a few times
> over the past months (here on morepork [1] and more recently here on
> francolin [2]).
>
> Failing test diff for those who prefer not to scroll:
>
> +++ /home/bf/bf-build/francolin/HEAD/pgsql.build/testrun/recovery/027_stream_regress/data/results/stats.out
> 2023-07-07 18:48:25.976313231 +0000
> @@ -1415,7 +1415,7 @@
> :io_sum_vac_strategy_after_reuses > :io_sum_vac_strategy_before_reuses;
> ?column? | ?column?
> ----------+----------
> - t | t
> + t | f
>
> My theory about the test failure is that, when there is enough demand
> for shared buffers, the flapping test fails because it expects buffer
> access strategy *reuses* and concurrent queries already flushed those
> buffers before they could be reused. Attached is a patch which I think
> will fix the test while keeping some code coverage. If we count
> evictions and reuses together, those should have increased.
I managed to reproduce that failure with the attached patch applied
(on master) and with the following script (that effectively multiplies
probability of the failure by 360):
CPPFLAGS="-O0" ./configure -q --enable-debug --enable-cassert --enable-tap-tests && make -s -j`nproc` && make -s check
-C src/test/recovery
mkdir -p src/test/recovery00/t
cp src/test/recovery/t/027_stream_regress.pl src/test/recovery00/t/
cp src/test/recovery/Makefile src/test/recovery00/
for ((i=1;i<=9;i++)); do cp -r src/test/recovery00/ src/test/recovery$i; done
for ((i=1;i<=10;i++)); do echo "iteration $i"; NO_TEMP_INSTALL=1 parallel --halt now,fail=1 -j9 --linebuffer --tag make
-s check -C src/test/{} ::: recovery1 recovery2 recovery3 recovery4 recovery5 recovery6 recovery7 recovery8 recovery9 ||
break; done
Without your patch, I get:
iteration 2
...
recovery5 # Failed test 'regression tests pass'
recovery5 # at t/027_stream_regress.pl line 92.
recovery5 # got: '256'
recovery5 # expected: '0'
...
src/test/recovery5/tmp_check/log/regress_log_027_stream_regress contains:
--- .../src/test/regress/expected/stats.out 2023-07-11 20:05:10.536059706 +0300
+++ .../src/test/recovery5/tmp_check/results/stats.out 2023-07-11 20:30:46.790551305 +0300
@@ -1418,7 +1418,7 @@
:io_sum_vac_strategy_after_reuses > :io_sum_vac_strategy_before_reuses;
?column? | ?column?
----------+----------
- t | t
+ t | f
(1 row)
With your patch applied, 10 iterations performed successfully for me.
So it looks like your theory and your fix are correct.
Best regards,
Alexander
Attachment | Content-Type | Size |
---|---|---|
regress-repeat-stats-line.patch | text/x-patch | 18.2 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Alena Rybakina | 2023-07-11 18:11:31 | Re: POC, WIP: OR-clause support for indexes |
Previous Message | Jacob Champion | 2023-07-11 17:50:28 | Re: [PoC] Federated Authn/z with OAUTHBEARER |