From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Cc: | David Pirotte <dpirotte(at)gmail(dot)com>, Euler Taveira <euler(at)timbira(dot)com(dot)br> |
Subject: | subscriptioncheck failure |
Date: | 2021-05-13 09:42:30 |
Message-ID: | CAA4eK1+uW1UGDHDz-HWMHMen76mKP7NJebOTZN4uwbyMjaYVww@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I noticed $SUBJECT in curculio.
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=curculio&dt=2021-05-11%2018%3A30%3A23
The failure test report:
t/020_messages.pl (Wstat: 7424 Tests: 1 Failed: 0)
Non-zero exit status: 29
Parse errors: Bad plan. You planned 5 tests but ran 1.
Files=21, Tests=149, 79 wallclock secs ( 0.03 usr 0.03 sys + 22.36
cusr 9.47 csys = 31.89 CPU)
Result: FAIL
Makefile:21: recipe for target 'check' failed
gmake: *** [check] Error 1
As far as I can understand the failure happens due to error:
2021-05-11 20:41:38.356 CEST [31177:4] 020_messages.pl ERROR:
replication slot "tap_sub" is active for PID 7354
2021-05-11 20:41:38.356 CEST [31177:5] 020_messages.pl STATEMENT:
SELECT get_byte(data, 1), encode(substr(data, 11, 8), 'escape')
FROM pg_logical_slot_peek_binary_changes('tap_sub', NULL, NULL,
'proto_version', '1',
'publication_names', 'tap_pub',
'messages', 'true')
OFFSET 1 LIMIT 1
As per my initial analysis, this happens because in the test after
creating a subscription, we are not waiting for the subscriber to
catch up. I think there is a narrow window where it is possible that
after creating a subscription, the apply worker starts replication
even after we have done Alter Subscription .. DISABLE. This is also
visible in buildfarm logs:
2021-05-11 20:41:38.345 CEST [4046:9] 020_messages.pl STATEMENT:
SELECT get_byte(data, 0)
FROM pg_logical_slot_peek_binary_changes('tap_sub', NULL, NULL,
'proto_version', '1',
'publication_names', 'tap_pub',
'messages', 'true')
...
2021-05-11 20:41:38.354 CEST [7354:6] tap_sub LOG: received
replication command: START_REPLICATION SLOT "tap_sub" LOGICAL 0/0
(proto_version '2', publication_names '"tap_pub"')
Notice that we have checked the messages before receiving the start
replication command from the subscriber. Now, if we would have waited
for the subscription to catch up then our check in pg_stat_replication
wouldn't have given a false positive. I have noticed that we do have
such wait in other subscription tests but somehow missed it here.
Thoughts?
--
With Regards,
Amit Kapila.
From | Date | Subject | |
---|---|---|---|
Next Message | Dilip Kumar | 2021-05-13 09:49:48 | Re: RFC: Logging plan of the running query |
Previous Message | Bharath Rupireddy | 2021-05-13 09:38:50 | Re: RFC: Logging plan of the running query |