Re: PostgreSQL 10.5 : Logical replication timeout results in PANIC in pg_wal "No space left on device"

From: Achilleas Mantzios <achill(at)matrix(dot)gatewaynet(dot)com>
To: pgsql-admin(at)lists(dot)postgresql(dot)org
Subject: Re: PostgreSQL 10.5 : Logical replication timeout results in PANIC in pg_wal "No space left on device"
Date: 2018-11-13 16:06:41
Message-ID: 48415b2a-ea0f-cf5e-1145-17e8797a6e79@matrix.gatewaynet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

On 13/11/18 5:35 μ.μ., Rui DeSousa wrote:
>
>
>> On Nov 13, 2018, at 7:00 AM, Achilleas Mantzios <achill(at)matrix(dot)gatewaynet(dot)com <mailto:achill(at)matrix(dot)gatewaynet(dot)com>> wrote:
>>
>> Is there a way for the WAL receiver to not have detected the termination of the replication stream?
>
> The teardown of the network socket on the upstream server should send a reset packet to the downstream server and at that point the WAL receiver would close its connection.  Is there any firewalls,
> router, rules, etc between the nodes that could have dropped the packet?

No

>
>>
>> Shouldn't normally the WAL receiver detect this and try again in wal_retrieve_retry_interval ?
>
> Not really… if the connection has already been torn down; the upstream server would send another reset packet on the next request and in this case it would.  However, if request packets at not
> reaching the upstream server; i.e. due to firewall silently dropping the packets (personally I believe firewall should always set reset packets to friendly hosts) then what happens is the TCP/IP
> send queue builds up with the requests packets instead — a t this point waiting on the OS to terminate the connection which can day or two depending on your TCP/IP setting.
>

Again no dropping, no firewall.

> What you want to use instead is wal_receiver_timeout to detect the given case where upstream server either no longer exists or the firewall, etc is silently dropping packets.

Once again from my original message :
"while setting up logical replication since August we had seen early on the need to increase max_receiver_timeout and max_sender_timeout from 60sec to 5mins"

So with wal_receiver_timeout='5 min', the receiver never detected any timeout.

>
>

--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message SAS 2018-11-13 16:27:53 Re: Ora2pg Not Getting Installed- Please Provide Inputs
Previous Message Rui DeSousa 2018-11-13 15:53:36 Re: hot_standby_feedback parameter doesn't work