Re: BF member drongo doesn't like 035_standby_logical_decoding.pl

From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Andrew Dunstan <andrew(at)dunslane(dot)net>, "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>
Subject: Re: BF member drongo doesn't like 035_standby_logical_decoding.pl
Date: 2025-01-24 19:00:00
Message-ID: 1d2b01df-6856-46b6-809e-bccc5516b8bb@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello Tom,

24.01.2025 18:42, Tom Lane wrote:
> I realized just now that drongo has been intermittently failing like this:
>
> 147/256 postgresql:recovery / recovery/035_standby_logical_decoding ERROR 2116.16s (exit status 255 or signal 127 SIGinvalid)
> ------------------------------------- 8< -------------------------------------
> stderr:
> # Failed test 'activeslot slot invalidation is logged with vacuum on pg_class'
> # at C:/prog/bf/root/REL_16_STABLE/pgsql/src/test/recovery/t/035_standby_logical_decoding.pl line 229.
> # poll_query_until timed out executing this query:
> # select (confl_active_logicalslot = 1) from pg_stat_database_conflicts where datname = 'testdb'
> # expecting this output:
> # t
> # last actual query output:
> # f
> # with stderr:
> # Failed test 'confl_active_logicalslot updated'
> # at C:/prog/bf/root/REL_16_STABLE/pgsql/src/test/recovery/t/035_standby_logical_decoding.pl line 235.
> # Tests were run but no plan was declared and done_testing() was not seen.
> # Looks like your test exited with 255 just after 24.
>
> This has been happening for some time, in all three branches where
> that test script exists. The oldest failure that looks like that in
> the v16 branch is
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2024-09-06%2004%3A19%3A35
>
> However, there are older failures showing a timeout of
> 035_standby_logical_decoding.pl that don't provide any detail, but
> might well be the same thing. The oldest one of those is from
> 2024-05-01, which is still considerably later than the test script
> itself (added on 2023-04-08). So it would seem that this is something
> we broke during 2024, rather than an aboriginal problem in this test.
>
> A search of the buildfarm logs did not turn up similar failures
> on any other animals.
>
> I have no idea how to proceed on narrowing down the cause...
>

Please take a look at the list of such failures since 2024-06-01 I
collected here:
https://wiki.postgresql.org/wiki/Known_Buildfarm_Test_Failures#035_standby_logical_decoding_standby.pl_fails_due_to_missing_activeslot_invalidation

There is also a reference to a discussion of the failure there:
https://www.postgresql.org/message-id/657815a2-5a89-fcc1-1c9d-d77a6986bc26@gmail.com
(In short, I observed that that test suffers from bgwriter's activity.)

Best regards,
Alexander Lakhin
Neon (https://neon.tech)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2025-01-24 19:16:18 Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Previous Message Dagfinn Ilmari Mannsåker 2025-01-24 18:59:42 Re: pg_createsubscriber TAP test wrapping makes command options hard to read.