From: | "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com> |
---|---|
To: | Fabrice Chapuis <fabrice636861(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | RE: Logical replication timeout problem |
Date: | 2022-01-20 09:05:09 |
Message-ID: | OS3PR01MB627527AD69BACB66F944928A9E5A9@OS3PR01MB6275.jpnprd01.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Jan 19, 2022 at 9:53 PM Fabrice Chapuis fabrice636861(at)gmail(dot)com<mailto:fabrice636861(at)gmail(dot)com> wrote:
> Hello Amit,
> If it takes little work for you, can you please send me a piece of code
> with the change needed to do the test
I wrote a patch(Send-keepalive.patch, please refer to attachment) according to
Amit's suggestions. But after I did some simple test about this patch by the
test script "test.sh"(please refer to attachment), I found the timeout problem
has not been fixed by this patch.
So I add some logs(please refer to Add-some-logs-to-debug.patch) to confirm newly
added WalSndKeepaliveIfNecessary() send keepalive message or not.
After applying the Send-keepalive.patch and Add-some-logs-to-debug.patch, I
found that the added message "send keep alive message" was not printed in
publisher-side log.
[publisher-side log]:
2022-01-20 15:21:50.057 CST [2400278] LOG: checkpoint complete: wrote 61 buffers (0.4%); 0 WAL file(s) added, 0 removed, 0 recycled; write=9.838 s, sync=0.720 s, total=10.559 s; sync files=4, longest=0.563 s, average=0.180 s; distance=538053 kB, estimate=543889 kB
2022-01-20 15:21:50.977 CST [2400278] LOG: checkpoints are occurring too frequently (11 seconds apart)
2022-01-20 15:21:50.977 CST [2400278] HINT: Consider increasing the configuration parameter "max_wal_size".
2022-01-20 15:21:50.988 CST [2400278] LOG: checkpoint starting: wal
2022-01-20 15:21:52.853 CST [2400404] LOG: begin load changes
2022-01-20 15:21:52.853 CST [2400404] STATEMENT: START_REPLICATION SLOT "sub" LOGICAL 0/0 (proto_version '3', publication_names '"pub"')
2022-01-20 15:22:52.969 CST [2410649] ERROR: replication slot "sub" is active for PID 2400404
2022-01-20 15:22:52.969 CST [2410649] STATEMENT: START_REPLICATION SLOT "sub" LOGICAL 0/0 (proto_version '3', publication_names '"pub"')
2022-01-20 15:22:57.980 CST [2410657] ERROR: replication slot "sub" is active for PID 2400404
[subscriber-side log]:
2022-01-20 15:16:10.975 CST [2400335] LOG: checkpoint starting: time
2022-01-20 15:16:16.052 CST [2400335] LOG: checkpoint complete: wrote 51 buffers (0.3%); 0 WAL file(s) added, 0 removed, 0 recycled; write=4.830 s, sync=0.135 s, total=5.078 s; sync files=39, longest=0.079 s, average=0.004 s; distance=149 kB, estimate=149 kB
2022-01-20 15:22:52.738 CST [2400400] ERROR: terminating logical replication worker due to timeout
2022-01-20 15:22:52.738 CST [2400332] LOG: background worker "logical replication worker" (PID 2400400) exited with exit code 1
2022-01-20 15:22:52.740 CST [2410648] LOG: logical replication apply worker for subscription "sub" has started
2022-01-20 15:22:52.969 CST [2410648] ERROR: could not start WAL streaming: ERROR: replication slot "sub" is active for PID 2400404
2022-01-20 15:22:52.970 CST [2400332] LOG: background worker "logical replication worker" (PID 2410648) exited with exit code 1
2022-01-20 15:22:57.977 CST [2410656] LOG: logical replication apply worker for subscription "sub" has started
It seems WalSndKeepaliveIfNecessary did not send keepalive message in testing. I
am still doing some research about the cause.
Attach the patches and test script mentioned above, in case someone wants to try.
If I miss something, please let me know.
Regards,
Wang wei
Attachment | Content-Type | Size |
---|---|---|
Send-keepalive.patch | application/octet-stream | 1.7 KB |
Add-some-logs-to-debug.patch | application/octet-stream | 1.3 KB |
test.sh | application/octet-stream | 1.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2022-01-20 09:31:15 | Re: pg_upgrade should truncate/remove its logs before running |
Previous Message | kuroda.hayato@fujitsu.com | 2022-01-20 08:21:42 | RE: [Proposal] Add foreign-server health checks infrastructure |