From: | Alexander Lakhin <exclusion(at)gmail(dot)com> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Euler Taveira <euler(at)eulerto(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Add a perl function in Cluster.pm to generate WAL |
Date: | 2024-01-05 20:00:00 |
Message-ID: | 2d39b5fa-ba90-48c8-4cde-ecba4a62fa3b@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
05.01.2024 02:48, Michael Paquier wrote:
> On Thu, Jan 04, 2024 at 04:00:01PM +0300, Alexander Lakhin wrote:
>> Reproduced here.
> Did you just make the run slow enough to show the failure with
> valgrind?
Yes, I just run several test instances (with no extra modifications) under
Valgrind with parallel as follows:
for i in `seq 20`; do cp -r src/test/recovery/ src/test/recovery_$i/; sed "s|src/test/recovery|src/test/recovery_$i|" -i
src/test/recovery_$i/Makefile; done
for i in `seq 20`; do echo "iteration $i"; parallel --halt now,fail=1 -j20 --linebuffer --tag PROVE_TESTS="t/035*"
NO_TEMP_INSTALL=1 make check -s -C src/test/recovery_{} PROVE_FLAGS="--timer" ::: `seq 20` || break; done
The test run fails for me on iterations 9, 8, 14 like so:
...
iteration 9
...
5
5 # Failed test 'inactiveslot slot invalidation is logged with vacuum on pg_authid'
5 # at t/035_standby_logical_decoding.pl line 224.
5
5 # Failed test 'activeslot slot invalidation is logged with vacuum on pg_authid'
5 # at t/035_standby_logical_decoding.pl line 229.
13 [07:35:53] t/035_standby_logical_decoding.pl .. ok 432930 ms ( 0.02 usr 0.00 sys + 292.08 cusr 13.05 csys =
305.15 CPU)
13 [07:35:53]
13 All tests successful.
...
I've also reproduced it without Valgrind in a VM with CPU slowed down to
5% (on iterations 2, 5, 4), where average test duration is about 800 sec:
4
4 # Failed test 'inactiveslot slot invalidation is logged with vacuum on pg_authid'
4 # at t/035_standby_logical_decoding.pl line 222.
4
4 # Failed test 'activeslot slot invalidation is logged with vacuum on pg_authid'
4 # at t/035_standby_logical_decoding.pl line 227.
6 [15:16:53] t/035_standby_logical_decoding.pl .. ok 763168 ms ( 0.68 usr 0.10 sys + 19.55 cusr 102.59 csys =
122.92 CPU)
>
>> As I can see in the failure logs you referenced, the first problem is:
>> # Failed test 'inactiveslot slot invalidation is logged with vacuum on pg_authid'
>> # at t/035_standby_logical_decoding.pl line 224.
>> It reminded me of:
>> https://www.postgresql.org/message-id/b2a1f7d0-7629-72c0-2da7-e9c4e336b010%40gmail.com
>>
>> It seems that it's not something new, and maybe that my analysis is still
>> valid. If so, VACUUM FREEZE/autovacuum = off should fix the issue.
> Not sure about that yet.
>
Your suspicion was proved right. After
git show c161ab74f src/test/recovery/t/035_standby_logical_decoding.pl | git apply -R
20 iterations with 20 tests in parallel performed successfully for me
(twice).
So it looks like c161ab74f really made the things worse.
Best regards,
Alexander
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2024-01-05 20:02:34 | Re: pg_upgrade failing for 200+ million Large Objects |
Previous Message | Robert Haas | 2024-01-05 19:58:55 | Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows |