RE: Re:BUG #17392: archiver process exited with exit code 2 was unexpectedly cause for immediate shutdown request

From: Улаев Александр Сергеевич <alexander(dot)ulaev(at)rtlabs(dot)ru>
To: Sergei Kornilov <sk(at)zsrv(dot)org>
Cc: "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>, "PG Bug reporting form" <noreply(at)postgresql(dot)org>
Subject: RE: Re:BUG #17392: archiver process exited with exit code 2 was unexpectedly cause for immediate shutdown request
Date: 2022-02-03 10:29:18
Message-ID: ca8f25eb70214fc5b320540375877c2a@rtlabs.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

We found on ETCD1 such errors in the syslog:

Feb 1 16:12:01 etcd1 etcd: got unexpected response error (etcdserver: request timed out)
Feb 1 16:12:02 etcd1 etcd: got unexpected response error (etcdserver: request timed out) [merged 1 repeated lines in 1.21s]
Feb 1 16:12:03 etcd1 etcd: got unexpected response error (etcdserver: request timed out) [merged 1 repeated lines in 1s]
Feb 1 16:12:20 etcd1 etcd: sync duration of 29.69857369s, expected less than 1s
Feb 1 16:26:55 etcd1 etcd: got unexpected response error (etcdserver: request timed out)
Feb 1 16:27:03 etcd1 etcd: got unexpected response error (etcdserver: request timed out)
Feb 1 16:27:17 etcd1 etcd: sync duration of 1m0.745329542s, expected less than 1s

So, this problem related to SAN => i\o freeze many VMs including DB's, ETCD nodes => etcd i\o long delay affect => patroni reaction to demoted self postgresql instance
Sergey, Thank you much for your support!

Best regards,
Ulaev Alexander

-----Original Message-----
From: Sergei Kornilov [mailto:sk(at)zsrv(dot)org]
Sent: Thursday, February 3, 2022 1:06 PM
To: Улаев Александр Сергеевич <alexander(dot)ulaev(at)rtlabs(dot)ru>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org; PG Bug reporting form <noreply(at)postgresql(dot)org>
Subject: Re:BUG #17392: archiver process exited with exit code 2 was unexpectedly cause for immediate shutdown request

Hello
> 2022-02-01 16:12:24,928 ERROR: failed to update leader lock
> 2022-02-01 16:12:27,063 INFO: demoted self because failed to update leader lock in DCS

Between these two messages, an immediate shutdown is called: https://github.com/zalando/patroni/blob/v2.1.2/patroni/ha.py#L1045

Regards, Sergei

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Marina Polyakova 2022-02-03 10:37:55 Re: BUG #17355: Server crashes on ExecReScanForeignScan in postgres_fdw when accessing foreign partition
Previous Message Sergei Kornilov 2022-02-03 10:06:11 Re:BUG #17392: archiver process exited with exit code 2 was unexpectedly cause for immediate shutdown request