Re: fairywren timeout failures on the pg_amcheck/005_opclass_damage test

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Alexander Lakhin <exclusion(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: fairywren timeout failures on the pg_amcheck/005_opclass_damage test
Date: 2024-07-26 12:41:05
Message-ID: f83ec95c-92b2-4cfe-8368-a0fd22db957c@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 2024-07-26 Fr 7:00 AM, Alexander Lakhin wrote:
> 25.07.2024 15:00, Alexander Lakhin wrote:
>
>>
>> The other question is: why is 005_opclass_damage taking so much time
>> there?
>> ...
>> $ make -s check -C src/bin/pg_amcheck/ PROVE_TESTS="t/005*"
>> PROVE_FLAGS="--timer"
>> [11:11:53] t/005_opclass_damage.pl .. ok     1370 ms ( 0.00 usr 0.00
>> sys +  0.10 cusr  0.07 csys =  0.17 CPU)
>>
>> $ echo "debug_parallel_query = regress" >/tmp/extra.config
>> $ TEMP_CONFIG=/tmp/extra.config make -s check -C src/bin/pg_amcheck/
>> PROVE_TESTS="t/005*" PROVE_FLAGS="--timer"
>> [11:12:46] t/005_opclass_damage.pl .. ok    40854 ms ( 0.00 usr 0.00
>> sys +  0.10 cusr  0.10 csys =  0.20 CPU)
>>
>> ...
>> So maybe at least this test should be improved for testing with
>> debug_parallel_query enabled, if such active use of parallel workers by
>> pg_amcheck can't be an issue to end users?
>>
>
> When running this test with "log_min_messages = DEBUG2" in my
> extra.config,
> I see thousands of the following messages in the test log:
> 2024-07-26 09:32:54.544 UTC [2572189:46] DEBUG:  postmaster received
> pmsignal signal
> 2024-07-26 09:32:54.544 UTC [2572189:47] DEBUG:  registering
> background worker "parallel worker for PID 2572197"
> 2024-07-26 09:32:54.544 UTC [2572189:48] DEBUG:  starting background
> worker process "parallel worker for PID 2572197"
> 2024-07-26 09:32:54.547 UTC [2572189:49] DEBUG:  unregistering
> background worker "parallel worker for PID 2572197"
> 2024-07-26 09:32:54.547 UTC [2572189:50] DEBUG:  background worker
> "parallel worker" (PID 2572205) exited with exit code 0
> 2024-07-26 09:32:54.547 UTC [2572189:51] DEBUG:  postmaster received
> pmsignal signal
> 2024-07-26 09:32:54.547 UTC [2572189:52] DEBUG:  registering
> background worker "parallel worker for PID 2572197"
> 2024-07-26 09:32:54.547 UTC [2572189:53] DEBUG:  starting background
> worker process "parallel worker for PID 2572197"
> 2024-07-26 09:32:54.549 UTC [2572189:54] DEBUG:  unregistering
> background worker "parallel worker for PID 2572197"
> 2024-07-26 09:32:54.549 UTC [2572189:55] DEBUG:  background worker
> "parallel worker" (PID 2572206) exited with exit code 0
> ...
>
> grep ' registering background worker' \
>  src/bin/pg_amcheck/tmp_check/log/005_opclass_damage_test.log | wc -l
> 15669
>
> So this test launches more than 15000 processes when debug_parallel_query
> is enabled.
>
> As far as I can see, this is happening because of the "PARALLEL UNSAFE"
> marking is ignored when the function is called by CREATE INDEX/amcheck.
>
> Namely, with a function defined as:
>     CREATE FUNCTION int4_asc_cmp (a int4, b int4) RETURNS int LANGUAGE
> sql AS $$
>         SELECT CASE WHEN $1 = $2 THEN 0 WHEN $1 > $2 THEN 1 ELSE -1
> END; $$;
>
> SELECT int4_asc_cmp(1, 2);
> executed without parallel workers. Whilst when it's used by an index:
> CREATE OPERATOR CLASS int4_fickle_ops FOR TYPE int4 USING btree AS
> ...
> OPERATOR 5 > (int4, int4), FUNCTION 1 int4_asc_cmp(int4, int4);
>
> INSERT INTO int4tbl (SELECT * FROM generate_series(1,1000) gs);
>
> CREATE INDEX fickleidx ON int4tbl USING btree (i int4_fickle_ops);
> launches 1000 parallel workers.
>
> (This is reminiscent of bug #18314.)
>
> One way to workaround this is to disable debug_parallel_query in the test
> and another I find possible is to set max_parallel_workers = 0.
>
>

But wouldn't either of those just be masking the problem?

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2024-07-26 12:45:20 Re: Detailed release notes
Previous Message Heikki Linnakangas 2024-07-26 12:40:50 Re: Direct SSL connection and ALPN loose ends