From: | "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com> |
---|---|
To: | Noah Misch <noah(at)leadboat(dot)com>, Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Jeff Davis <pgsql(at)j-davis(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, fabriziomello(at)gmail(dot)com, tushar <tushar(dot)ahuja(at)enterprisedb(dot)com>, Rahila Syed <rahila(dot)syed(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Minimal logical decoding on standbys |
Date: | 2023-04-11 08:55:43 |
Message-ID: | e217cdaf-53a1-6d41-59e2-4f4a8bbe9f23@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 4/11/23 10:20 AM, Drouvot, Bertrand wrote:
> Hi,
>
> On 4/11/23 7:36 AM, Noah Misch wrote:
>> On Fri, Apr 07, 2023 at 11:12:26AM -0700, Andres Freund wrote:
>>> --- /dev/null
>>> +++ b/src/test/recovery/t/035_standby_logical_decoding.pl
>>> @@ -0,0 +1,720 @@
>>> +# logical decoding on standby : test logical decoding,
>>> +# recovery conflict and standby promotion.
>> ...
>>> +$node_primary->append_conf('postgresql.conf', q{
>>> +wal_level = 'logical'
>>> +max_replication_slots = 4
>>> +max_wal_senders = 4
>>> +log_min_messages = 'debug2'
>>> +log_error_verbosity = verbose
>>> +});
>>
>> Buildfarm member hoverfly stopped reporting in when this test joined the tree.
>> It's currently been stuck here for 140 minutes:
>>
>
> Thanks for the report!
>
> It's looping on:
>
> 2023-04-11 02:57:52.516 UTC [62718288:5] 035_standby_logical_decoding.pl LOG: 00000: statement: SELECT restart_lsn IS NOT NULL
> FROM pg_catalog.pg_replication_slots WHERE slot_name = 'promotion_inactiveslot'
>
> And the reason is that the slot is not being created:
>
> $ grep "CREATE_REPLICATION_SLOT" 035_standby_logical_decoding_standby.log | tail -2
> 2023-04-11 02:57:47.287 UTC [9241178:15] 035_standby_logical_decoding.pl STATEMENT: CREATE_REPLICATION_SLOT "otherslot" LOGICAL "test_decoding" ( SNAPSHOT 'nothing')
> 2023-04-11 02:57:47.622 UTC [9241178:23] 035_standby_logical_decoding.pl STATEMENT: CREATE_REPLICATION_SLOT "otherslot" LOGICAL "test_decoding" ( SNAPSHOT 'nothing')
>
> Not sure why the slot is not being created.
>
> There is also "replication apply delay" increasing:
>
> 2023-04-11 02:57:49.183 UTC [13304488:253] DEBUG: 00000: sendtime 2023-04-11 02:57:49.111363+00 receipttime 2023-04-11 02:57:49.183512+00 replication apply delay 644 ms transfer latency 73 ms
> 2023-04-11 02:57:49.184 UTC [13304488:259] DEBUG: 00000: sendtime 2023-04-11 02:57:49.183461+00 receipttime 2023-04-11 02:57:49.1842+00 replication apply delay 645 ms transfer latency 1 ms
> 2023-04-11 02:57:49.221 UTC [13304488:265] DEBUG: 00000: sendtime 2023-04-11 02:57:49.184166+00 receipttime 2023-04-11 02:57:49.221059+00 replication apply delay 682 ms transfer latency 37 ms
> 2023-04-11 02:57:49.222 UTC [13304488:271] DEBUG: 00000: sendtime 2023-04-11 02:57:49.221003+00 receipttime 2023-04-11 02:57:49.222144+00 replication apply delay 683 ms transfer latency 2 ms
> 2023-04-11 02:57:49.222 UTC [13304488:277] DEBUG: 00000: sendtime 2023-04-11 02:57:49.222095+00 receipttime 2023-04-11 02:57:49.2228+00 replication apply delay 684 ms transfer latency 1 ms
>
> Noah, I think hoverfly is yours, would it be possible to have access (I'm not an AIX expert though) or check if you see a slot creation hanging and if so why?
>
Well, we can see in 035_standby_logical_decoding_standby.log:
2023-04-11 02:57:49.180 UTC [62718258:5] [unknown] FATAL: 3D000: database "testdb" does not exist
While, on the primary:
2023-04-11 02:57:48.505 UTC [62718254:5] 035_standby_logical_decoding.pl LOG: 00000: statement: CREATE DATABASE testdb
The TAP test is doing:
"
##################################################
# Test standby promotion and logical decoding behavior
# after the standby gets promoted.
##################################################
$node_standby->reload;
$node_primary->psql('postgres', q[CREATE DATABASE testdb]);
$node_primary->safe_psql('testdb', qq[CREATE TABLE decoding_test(x integer, y text);]);
# create the logical slots
create_logical_slots($node_standby, 'promotion_');
"
I think we might want to add:
$node_primary->wait_for_replay_catchup($node_standby);
before calling the slot creation.
It's done in the attached, would it be possible to give it a try please?
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachment | Content-Type | Size |
---|---|---|
hoverfly.patch | text/plain | 667 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2023-04-11 09:02:23 | Re: Support logical replication of DDLs |
Previous Message | Peter Smith | 2023-04-11 08:20:43 | Re: [PoC] pg_upgrade: allow to upgrade publisher node |