Logical replication failed with SSL SYSCALL error

From: shaurya jain <12345shaurya(at)gmail(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Logical replication failed with SSL SYSCALL error
Date: 2023-04-15 21:10:57
Message-ID: CAHHJ3NT4KYP_L3On+3hTdfsX+TLQmGf8dLh5a4qCa+Xc0Wmt4w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Hi Team,

Postgres Version:- 13.8
Issue:- Logical replication failing with SSL SYSCALL error
Priority:-High

We are migrating our database through logical replications, and all of
sudden below error pops up in the source and target logs which leads us to
nowhere.

*Logs from Source:-*
LOG: could not send data to client: Connection reset by peer
STATEMENT: COPY public.test TO STDOUT
FATAL: connection to client lost
STATEMENT: COPY public.test TO STDOUT

*Logs from Target:-*
2023-04-15 19:07:02 UTC::@:[1250]:ERROR: could not receive data from WAL
stream: SSL SYSCALL error: Connection timed out
2023-04-15 19:07:02 UTC::@:[1250]:CONTEXT: COPY test, line 365326932
2023-04-15 19:07:03 UTC::@:[505]:LOG: background worker "logical
replication worker" (PID 1250) exited with exit code 1
2023-04-15 19:07:03 UTC::@:[7155]:LOG: logical replication table
synchronization worker for subscription " sub_tables_2_180", table "test"
has started
2023-04-15 19:12:05
UTC:10.144.19.34(33276):postgres(at)webadmit_staging:[7112]:WARNING:
there is no transaction in progress
2023-04-15 19:14:08
UTC:10.144.19.34(33324):postgres(at)webadmit_staging:[6052]:LOG:
could not receive data from client: Connection reset by peer
2023-04-15 19:17:23 UTC::@:[2112]:ERROR: could not receive data from WAL
stream: SSL SYSCALL error: Connection timed out
2023-04-15 19:17:23 UTC::@:[1089]:ERROR: could not receive data from WAL
stream: SSL SYSCALL error: Connection timed out
2023-04-15 19:17:23 UTC::@:[2556]:ERROR: could not receive data from WAL
stream: SSL SYSCALL error: Connection timed out
2023-04-15 19:17:23 UTC::@:[505]:LOG: background worker "logical
replication worker" (PID 2556) exited with exit code 1
2023-04-15 19:17:23 UTC::@:[505]:LOG: background worker "logical
replication worker" (PID 2112) exited with exit code 1
2023-04-15 19:17:23 UTC::@:[505]:LOG: background worker "logical
replication worker" (PID 1089) exited with exit code 1
2023-04-15 19:17:23 UTC::@:[7287]:LOG: logical replication apply worker for
subscription "sub_tables_2_180" has started
2023-04-15 19:17:23 UTC::@:[7288]:LOG: logical replication apply worker for
subscription "sub_tables_3_192" has started
2023-04-15 19:17:23 UTC::@:[7289]:LOG: logical replication apply worker for
subscription "sub_tables_1_180" has started

Just after this error, all other replication slots get disabled for some
time and come back online along with COPY command with the new PID in
pg_stat_activity.

I have a few queries regarding this:-

1. The exact reason for disconnection (Few articles claim memory and few
network)
2. Will it lead to data inconsistency?
3. Does this new PID COPY command again migrate the whole data of the
test table once again?

Please help we got stuck here.
--
Thanks and Regards,
Shaurya Jain
email:- 12345shaurya(at)gmail(dot)com
*Mobile:- +91-8802809405*
LinkedIn:- https://www.linkedin.com/in/shaurya-jain-74353023

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Bryn Llewellyn 2023-04-15 21:15:41 Re: cursors with prepared statements
Previous Message Adrian Klaver 2023-04-15 16:41:18 Re: FW: Error!

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2023-04-15 21:46:14 Re: segfault tied to "IS JSON predicate" commit
Previous Message Daniel Gustafsson 2023-04-15 20:40:08 Re: Should vacuum process config file reload more often