From: | Fabio Pardi <f(dot)pardi(at)portavita(dot)eu> |
---|---|
To: | Ganesh Korde <ganeshakorde(at)gmail(dot)com> |
Cc: | pgsql-admin(at)lists(dot)postgresql(dot)org |
Subject: | Re: Streaming replication connection break - unexpected EOF on standby connection |
Date: | 2018-07-17 08:14:32 |
Message-ID: | 27b2e7bf-9821-dbbe-2f6c-de414422634f@portavita.eu |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-admin |
Yeah, well done!
Thanks for letting us know and glad my tips helped!
regards,
fabio pardi
On 07/17/2018 10:10 AM, Ganesh Korde wrote:
> Hi,
>
> Finally issue has been resolved and issue was with network. There
> were two issues as below.
>
> 1. We have two links between primary and secondary, VPN tunnel and
> L2TP. Multiple routes were configured for VPN and L2TP from the
> Secondary to primary. Tunnel and link was always up.
> But, ss the source from Secondary is coming through tunnel towards
> Primary the connection was getting dropped after reaching the
> destination due to ambiguity on routes between L2TP and VPN tunnel. This
> issue has been resolved by allowing access via VPN tunnel
> removing L2TP route.
>
> 2. After fixing the above, still disconnection was happening, but after
> specific time interval. It was due to certain negotiation time set IPSEC
> tunnel.
> So they have enabled auto negotiation on the IPSEC tunnel so it won’t
> wait from tunnel initiation from other end.
>
> So issues related disconnection has been resolved.
>
> Thanks Johannes and Fabio for your help.
>
> Regards,
> Ganesh.
>
> On Tue, Jul 3, 2018 at 5:37 PM Fabio Pardi <f(dot)pardi(at)portavita(dot)eu
> <mailto:f(dot)pardi(at)portavita(dot)eu>> wrote:
>
> Hi Ganesh,
>
> the logs you posted refer to timeouts in the connections.
>
> Your configuration tells that in case of network drop, the standby
> server will be the first to acknowledge it. That's because
> wal_receiver_timeout is < than wal_sender_timeout
>
> From the documentation:
>
> |wal_receiver_timeout| (|integer|)
>
> Terminate replication connections that are inactive longer than
> the specified number of milliseconds. This is useful for the
> receiving standby server to detect a primary node crash or
> network outage. A value of zero disables the timeout mechanism.
> This parameter can only be set in the |postgresql.conf| file or
> on the server command line. The default value is 60 seconds.
>
>
> That might explain why your secondary server calls for a RST.
>
> RST packages you posted are more a consequence, than a cause of your
> problem.
>
> I think that the RST is sent to acknowledge master that the
> connection should be closed due to timeout.
>
> What above goes together with the fact that the sequence number
> 1232664740 of the RST packet is retransmitted several times, meaning
> that it did not reach its destination at first.
>
> I would look more carefully to your network because I suspect the
> real problem might be there.
>
>
> regards,
>
> fabio pardi
>
>
>
>
> On 03/07/18 09:36, Ganesh Korde wrote:
>> Hi,
>>
>> After analysis by network team, they found packets are getting
>> reset by Secondary server. Below are the logs.
>>
>> 782.822280 port7 in <Secondary_server_IP>.35918 ->
>> <Primary_server_IP>.5433: rst 1232664740
>>
>> 782.822310 wan2 out <Secondary_server_IP>.35918 ->
>> <Primary_server_IP>.5433: rst 1232664740
>>
>> 782.822313 port7 in <Secondary_server_IP>.35918 ->
>> <Primary_server_IP>.5433: rst 1232664740
>>
>> 782.822315 wan2 out <Secondary_server_IP>.35918 ->
>> <Primary_server_IP>.5433: rst 1232664740
>>
>> 782.822317 port7 in <Secondary_server_IP>.35918 ->
>> <Primary_server_IP>.5433: rst 1232664740
>>
>> 782.822319 wan2 out <Secondary_server_IP>.35918 ->
>> <Primary_server_IP>.5433: rst 1232664740
>>
>> 782.822345 port7 in <Secondary_server_IP>.35918 ->
>> <Primary_server_IP>.5433: rst 1232664740
>>
>>
>> But they didn't able to find why secondary generating reset
>> packet. There are no any devices between these servers which can
>> modify the packets.
>> Though both servers are on different firewall, but packets are
>> getting reset at secondary server and not at the firewall level,
>> we can see this in the log.
>>
>> Below points I would like to mention about application
>> 1. This connection interruption happens in day time, when
>> transactions are little bit high. In day time, average
>> transactions per second are 5 (Inserts and processing).
>> 2. We are not using connection pool, so each time request comes
>> app server creates new connection to db server and when processing
>> is done, app server disconnects.
>>
>> We are now clue less why secondary server resetting the packets.
>> Any help is highly appreciated.
>>
>> Thanks & Regards,
>> Ganesh.
>>
>>
>>
>>
>> On Thu, Jun 28, 2018 at 3:38 PM Ganesh Korde
>> <ganeshakorde(at)gmail(dot)com <mailto:ganeshakorde(at)gmail(dot)com>> wrote:
>>
>> Hi Johannes,
>>
>> Thanks for your reply. We are using VPN Tunnel between these
>> two hosts. I will check with network team, with remaining
>> questions you mentioned and will get back.
>>
>> Thanks & Regards,
>> Ganesh.
>>
>> On Wed, Jun 27, 2018 at 6:46 PM Johannes Truschnigg
>> <johannes(at)truschnigg(dot)info <mailto:johannes(at)truschnigg(dot)info>>
>> wrote:
>>
>> Hi Ganesh,
>>
>>
>> On Wed, Jun 27, 2018 at 06:37:25PM +0530, Ganesh Korde wrote:
>> > [...]
>> > 1. Because of what reason, " unexpected EOF on standby
>> connection" occurs
>> > on primary db server?
>> > 2. After replication disconnection, secondary should
>> immediately connect to
>> > primary, but it takes some time, what could be the
>> reason for this?
>>
>> From skimming the log, it seems to me that there is an
>> issue at the
>> socket/network level, which yields the "connection reset
>> by peer" eror
>> message.
>>
>> What is the network between these two hosts like? Is it a
>> WAN link; is a VPN
>> or SSH tunnel involved? Do you have other, long-running
>> TCP sessions between
>> these peers, and do they experience similar or other
>> problems? Do the hosts'
>> link-layer stats hint at problems, e. g. packet loss? Do
>> the hosts' kernels
>> leave a message hinting at L2 connectivity problems in
>> their debug ringbuffers
>> (`dmesg`) at the time you observe the replication drop out?
>>
>> --
>> with best regards:
>> - Johannes Truschnigg ( johannes(at)truschnigg(dot)info
>> <mailto:johannes(at)truschnigg(dot)info> )
>>
>> www: https://johannes.truschnigg.info/
>> phone: +43 650 2 133337
>> xmpp: johannes(at)truschnigg(dot)info
>> <mailto:johannes(at)truschnigg(dot)info>
>>
>> Please do not bother me with HTML-email or attachments.
>> Thank you.
>>
>
From | Date | Subject | |
---|---|---|---|
Next Message | ramsiddu007 | 2018-07-17 09:30:43 | 65279 Invisible ASCII Character |
Previous Message | Ganesh Korde | 2018-07-17 08:10:18 | Re: Streaming replication connection break - unexpected EOF on standby connection |