From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Subject: | Test failures of 100_bugs.pl |
Date: | 2023-01-24 03:23:27 |
Message-ID: | 20230124032327.fakewkdgrzmkrr2l@awork3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
cfbot, the buildfarm and locally I have seen 100_bugs.pl fail
occasionally. Just rarely enough that I never got around to looking into it
for real.
Just now there was another failure on master:
https://cirrus-ci.com/task/5279589287591936
[01:00:49.441] ok 1 - index predicates do not cause crash
[01:00:49.441] ok 2 - update to temporary table without replica identity with FOR ALL TABLES publication
[01:00:49.441] ok 3 - update to unlogged table without replica identity with FOR ALL TABLES publication
[01:00:49.441] # test failed
[01:00:49.441] --- stderr ---
[01:00:49.441] # poll_query_until timed out executing this query:
[01:00:49.441] # SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT IN ('r', 's');
[01:00:49.441] # expecting this output:
[01:00:49.441] # t
[01:00:49.441] # last actual query output:
[01:00:49.441] # f
[01:00:49.441] # with stderr:
[01:00:49.441] # Tests were run but no plan was declared and done_testing() was not seen.
[01:00:49.441] # Looks like your test exited with 29 just after 3.
[01:00:49.441]
[01:00:49.441] (test program exited with status code 29)
the regress log:
https://api.cirrus-ci.com/v1/artifact/task/5279589287591936/testrun/build-32/testrun/subscription/100_bugs/log/regress_log_100_bugs
and twoway's log:
https://api.cirrus-ci.com/v1/artifact/task/5279589287591936/testrun/build-32/testrun/subscription/100_bugs/log/100_bugs_twoways.log
We see t2 added to the publication:
2023-01-24 00:57:30.099 UTC [73654][client backend] [100_bugs.pl][7/5:0] LOG: statement: ALTER PUBLICATION testpub ADD TABLE t2
And that *then* "t" was synchronized:
2023-01-24 00:57:30.102 UTC [73640][logical replication worker] LOG: logical replication table synchronization worker for subscription "testsub", table "t" has finished
and then that the refresh was issued:
2023-01-24 00:57:30.128 UTC [73657][client backend] [100_bugs.pl][5/10:0] LOG: statement: ALTER SUBSCRIPTION testsub REFRESH PUBLICATION
And we see a walsender starting and the query to get the new tables being executed:
2023-01-24 00:57:30.139 UTC [73660][walsender] [testsub][6/8:0] LOG: statement: SELECT DISTINCT t.schemaname, t.tablename
, t.attnames
FROM pg_catalog.pg_publication_tables t
WHERE t.pubname IN ('testpub')
And that's it, the rest of the time is just polling.
Perhaps wait_for_subscription_sync() should dump the set of rel states to make
something like this more debuggable?
The fact that the synchronization for t finished just before the refresh makes
me wonder if a wakeup or a cache invalidation got lost?
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Zheng Li | 2023-01-24 03:27:50 | Re: Support logical replication of DDLs |
Previous Message | Peter Geoghegan | 2023-01-24 03:22:18 | Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation |