From: | Дмитрий <dsolik(at)mail(dot)ru> |
---|---|
To: | Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com> |
Cc: | pgsql-general <pgsql-general(at)postgresql(dot)org> |
Subject: | Re[2]: FATAL: could not send data to WAL stream: lost synchronization with server: got message type "0", length 892351284 |
Date: | 2025-01-28 13:09:07 |
Message-ID: | 1738069747.975077615@f745.i.mail.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Colleagues confirmed that the problem is with the network between data centers. Thank you!
воскресенье, 26 января 2025г., 20:33 +03:00 от Adrian Klaver adrian(dot)klaver(at)aklaver(dot)com :
>On 1/26/25 03:29, Дмитрий wrote:
> "How was it shut down, on purpose or a hardware/software issue?"
> - I reboot the receiver every 2 minutes on purpose. I determined this
> time empirically, because replication breaks down approximately every
> minute and a half. The reboot helps to advance the receiver.
>
> "Also do you have corresponding logs from primary?"
> - Attached to this message.
>
> "Unless, is there cascading replication going on?"
> - No, this is replication from the leader. The leader has its two
> replicas and they are all in one data center. And the problematic
> replica is needed to migrate to another data center.
>
> "Was that a manual intervention?"
> - Yes, reboot on schedule, every two minutes.
>
> "Is that what is shown above or have you restarted since the above and
> the server is running?"
> - Sometimes replication works without problems for several hours. But
> when a breakdown occurs, rebooting every two minutes helps to catch up
> with this replica.
>1) It would make life easier if the log line entry prefix timestamp was
>set to same precision on primary and standby. As of now it looks like
>the primary has %t (Time stamp without milliseconds) and the standby has
>%m (Time stamp with milliseconds)
>
>2) From the logs.
>
>Primary:
>
>2025-01-26 12:21:27 MSK [656]: [11-1]
>app=v-host-n1,user=replicator,db=[unknown],client=192.168.5.1 STATEMENT:
> START_REPLICATION SLOT "slot_migration_to_rcod" 106B6/52000000 TIMELINE 61
>
>2025-01-26 12:21:27 MSK [656]: [12-1]
>app=v-host-n1,user=replicator,db=[unknown],client=192.168.5.1 LOG:
>disconnection: session time: 0:01:05.329 user=replicator database=
>host=192.168.5.1 port=58380
>
>
>Standby:
>
>2025-01-26 12:21:27.113 MSK [10824] FATAL: could not send data to WAL
>stream: lost synchronization with server: got message type "0", length
>825373235
>
>
>Do you know what is doing START_REPLICATION SLOT?
>
>
> Another interesting point. In addition to this replication, there are
> two more, to the same data center. One replication had the same problem,
> but a one-time restart helped to solve the problem, the replication is
> still working normally. And the second replication does not have such
> problems, it has been working since its launch, more than a month ago.
>
> --
>
>
>
>--
>Adrian Klaver
>adrian(dot)klaver(at)aklaver(dot)com
From | Date | Subject | |
---|---|---|---|
Next Message | Laurenz Albe | 2025-01-28 13:28:36 | Re: Log retention query |
Previous Message | Junwang Zhao | 2025-01-28 12:19:28 | Re: Log retention query |