Re: PostgreSQL 10.5 : Logical replication timeout results in PANIC in pg_wal "No space left on device"

From: Achilleas Mantzios <achill(at)matrix(dot)gatewaynet(dot)com>
To: pgsql-admin(at)lists(dot)postgresql(dot)org
Subject: Re: PostgreSQL 10.5 : Logical replication timeout results in PANIC in pg_wal "No space left on device"
Date: 2018-11-21 06:03:57
Message-ID: b34279c0-14ea-bf3b-61a4-109792059e09@matrix.gatewaynet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin


On 20/11/18 10:48 μ.μ., Rui DeSousa wrote:
>
>
>> On Nov 20, 2018, at 3:34 PM, Achilleas Mantzios
>> <achill(at)matrix(dot)gatewaynet(dot)com <mailto:achill(at)matrix(dot)gatewaynet(dot)com>>
>> wrote:
>>
>> Hey, I was reading the docs, it seems it means :
>>
>> net.ipv4.tcp_keepalive_time + net.ipv4.tcp_keepalive_intvl *
>> net.ipv4.tcp_keepalive_probes = 2hrs 11 Mins 15 Secs, rather than 18 Hrs
>
> Yeah, that’s correct.  I wonder why it didn’t terminate.

Most probably because there was another created clone, cloud migration
magic, that's my theory, albeit not confirmed by the provider. Logical
worker (walreceiver) was still alive and happy even after the primary
crushed. I have the logs from the other standby and it immediately
detected the problem (PANIC on the primary) and retried. No firewall
dropping packets, in every test I did, the logical bgworker detects any
problems *instantly*, and retries after 5 secs by default.

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Keith 2018-11-21 06:40:27 Re: Trigger to create automated range partition table
Previous Message Srinivas Reddy 2018-11-21 04:59:41 Error while upgrading from 9.5 to 10