Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed

From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed
Date: 2024-06-08 04:00:00
Message-ID: 657815a2-5a89-fcc1-1c9d-d77a6986bc26@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello Bertrand and Michael,

23.01.2024 11:07, Bertrand Drouvot wrote:
> On Tue, Jan 23, 2024 at 02:50:06PM +0900, Michael Paquier wrote:
>
>> Anyway, that's not the end of it. What should we do for snapshot
>> snapshot records coming from the bgwriter?
> What about?
>
> 3) depending on how stabilized this test (and others that suffer from "random"
> xl_running_xacts) is, then think about the bgwriter.

A recent buildfarm failure [1] reminds me of that remaining question.
Here we have a slow machine (a successful run, for example [2], shows
541.13s duration of the test) and the following information logged:

[13:55:13.725](34.411s) ok 25 - inactiveslot slot invalidation is logged with vacuum on pg_class
[13:55:13.727](0.002s) not ok 26 - activeslot slot invalidation is logged with vacuum on pg_class
[13:55:13.728](0.001s) #   Failed test 'activeslot slot invalidation is logged with vacuum on pg_class'
#   at C:/prog/bf/root/HEAD/pgsql/src/test/recovery/t/035_standby_logical_decoding.pl line 229.
[14:27:42.995](1949.267s) # poll_query_until timed out executing this query:
# select (confl_active_logicalslot = 1) from pg_stat_database_conflicts where datname = 'testdb'
# expecting this output:
# t
# last actual query output:
# f
# with stderr:
[14:27:42.999](0.004s) not ok 27 - confl_active_logicalslot updated
[14:27:43.000](0.001s) #   Failed test 'confl_active_logicalslot updated'
#   at C:/prog/bf/root/HEAD/pgsql/src/test/recovery/t/035_standby_logical_decoding.pl line 235.
Timed out waiting confl_active_logicalslot to be updated at
C:/prog/bf/root/HEAD/pgsql/src/test/recovery/t/035_standby_logical_decoding.pl line 235.

---
035_standby_logical_decoding_standby.log:
2024-06-06 13:55:07.715 UTC [9172:7] LOG:  invalidating obsolete replication slot "row_removal_inactiveslot"
2024-06-06 13:55:07.715 UTC [9172:8] DETAIL:  The slot conflicted with xid horizon 754.
2024-06-06 13:55:07.715 UTC [9172:9] CONTEXT:  WAL redo at 0/4020A80 for Heap2/PRUNE_ON_ACCESS: snapshotConflictHorizon:
754, isCatalogRel: T, nplans: 0, nredirected: 0, ndead: 1, nunused: 0, dead: [48]; blkref #0: rel 1663/16384/2610, blk 0
2024-06-06 13:55:14.372 UTC [7532:1] [unknown] LOG:  connection received: host=127.0.0.1 port=55328
2024-06-06 13:55:14.381 UTC [7532:2] [unknown] LOG:  connection authenticated: identity="EC2AMAZ-P7KGG90\\pgrunner"
method=sspi
(C:/prog/bf/root/HEAD/pgsql.build/testrun/recovery/035_standby_logical_decoding/data/t_035_standby_logical_decoding_standby_data/pgdata/pg_hba.conf:2)
2024-06-06 13:55:14.381 UTC [7532:3] [unknown] LOG:  connection authorized: user=pgrunner database=postgres
application_name=035_standby_logical_decoding.pl
2024-06-06 13:55:14.443 UTC [7532:4] 035_standby_logical_decoding.pl LOG:  statement: select (confl_active_logicalslot =
1) from pg_stat_database_conflicts where datname = 'testdb'
2024-06-06 13:55:14.452 UTC [7532:5] 035_standby_logical_decoding.pl LOG:  disconnection: session time: 0:00:00.090
user=pgrunner database=postgres host=127.0.0.1 port=55328
# (there is no `invalidating obsolete replication slot "row_removal_activeslot"` message)
...
2024-06-06 14:27:42.675 UTC [4032:4] 035_standby_logical_decoding.pl LOG:  statement: select (confl_active_logicalslot =
1) from pg_stat_database_conflicts where datname = 'testdb'
2024-06-06 14:27:42.681 UTC [4032:5] 035_standby_logical_decoding.pl LOG:  disconnection: session time: 0:00:00.080
user=pgrunner database=postgres host=127.0.0.1 port=58713
2024-06-06 14:27:43.095 UTC [7892:2] FATAL:  could not receive data from WAL stream: server closed the connection
unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.

It's hard to determine from this info, why row_removal_activeslot was not
invalidated, but running this test on a slowed down Windows VM, I (still)
get the same looking failures caused by RUNNING_XACTS appeared just before
`invalidating obsolete replication slot "row_removal_inactiveslot"`.
So I would consider this failure as yet another result of bgwriter activity
and add it to the list of known failures as such...

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2024-06-06%2012%3A36%3A11
[2] https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=drongo&dt=2024-06-05%2017%3A03%3A13&stg=misc-check

Best regards,
Alexander

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message jian he 2024-06-08 04:14:53 Re: altering a column's collation leaves an invalid foreign key
Previous Message Thomas Munro 2024-06-07 22:36:45 Re: Assert in heapgettup_pagemode() fails due to underlying buffer change