Re: Add a perl function in Cluster.pm to generate WAL

From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Euler Taveira <euler(at)eulerto(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Add a perl function in Cluster.pm to generate WAL
Date: 2024-01-05 20:00:00
Message-ID: 2d39b5fa-ba90-48c8-4cde-ecba4a62fa3b@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

05.01.2024 02:48, Michael Paquier wrote:
> On Thu, Jan 04, 2024 at 04:00:01PM +0300, Alexander Lakhin wrote:
>> Reproduced here.
> Did you just make the run slow enough to show the failure with
> valgrind?

Yes, I just run several test instances (with no extra modifications) under
Valgrind with parallel as follows:
for i in `seq 20`; do cp -r src/test/recovery/ src/test/recovery_$i/; sed "s|src/test/recovery|src/test/recovery_$i|" -i
src/test/recovery_$i/Makefile; done

for i in `seq 20`; do echo "iteration $i"; parallel --halt now,fail=1 -j20 --linebuffer --tag PROVE_TESTS="t/035*"
NO_TEMP_INSTALL=1 make check -s -C src/test/recovery_{} PROVE_FLAGS="--timer" ::: `seq 20` || break; done

The test run fails for me on iterations 9, 8, 14 like so:
...
iteration 9
...
5
5       #   Failed test 'inactiveslot slot invalidation is logged with vacuum on pg_authid'
5       #   at t/035_standby_logical_decoding.pl line 224.
5
5       #   Failed test 'activeslot slot invalidation is logged with vacuum on pg_authid'
5       #   at t/035_standby_logical_decoding.pl line 229.
13      [07:35:53] t/035_standby_logical_decoding.pl .. ok   432930 ms ( 0.02 usr  0.00 sys + 292.08 cusr 13.05 csys =
305.15 CPU)
13      [07:35:53]
13      All tests successful.
...

I've also reproduced it without Valgrind in a VM with CPU slowed down to
5% (on iterations 2, 5, 4), where average test duration is about 800 sec:

4
4       #   Failed test 'inactiveslot slot invalidation is logged with vacuum on pg_authid'
4       #   at t/035_standby_logical_decoding.pl line 222.
4
4       #   Failed test 'activeslot slot invalidation is logged with vacuum on pg_authid'
4       #   at t/035_standby_logical_decoding.pl line 227.
6       [15:16:53] t/035_standby_logical_decoding.pl .. ok   763168 ms ( 0.68 usr  0.10 sys + 19.55 cusr 102.59 csys =
122.92 CPU)

>
>> As I can see in the failure logs you referenced, the first problem is:
>> #   Failed test 'inactiveslot slot invalidation is logged with vacuum on pg_authid'
>> #   at t/035_standby_logical_decoding.pl line 224.
>> It reminded me of:
>> https://www.postgresql.org/message-id/b2a1f7d0-7629-72c0-2da7-e9c4e336b010%40gmail.com
>>
>> It seems that it's not something new, and maybe that my analysis is still
>> valid. If so, VACUUM FREEZE/autovacuum = off should fix the issue.
> Not sure about that yet.
>

Your suspicion was proved right. After
git show c161ab74f src/test/recovery/t/035_standby_logical_decoding.pl  | git apply -R
20 iterations with 20 tests in parallel performed successfully for me
(twice).

So it looks like c161ab74f really made the things worse.

Best regards,
Alexander

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-01-05 20:02:34 Re: pg_upgrade failing for 200+ million Large Objects
Previous Message Robert Haas 2024-01-05 19:58:55 Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows