Patroni question

From: "Zwettler Markus (OIZ)" <Markus(dot)Zwettler(at)zuerich(dot)ch>
To: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Patroni question
Date: 2022-09-22 11:04:33
Message-ID: d1ee012b1c9c4367ade1e8662e80a0dc@zuerich.ch
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

We had a failover.
I would read the Patroni logs below as following.

2022-09-21 11:13:56,384 secondary did a HTTP GET request to primary. This failed with a read timeout.
2022-09-21 11:13:56,792 secondary promoted itself to primary
2022-09-21 11:13:57,279 primary did a HTTP GET request to secondary. An exception happend. Probably also due to read timeout.
2022-09-21 11:13:57,983 primary demoted itself

So, the failover has been caused by a network timeout between primary and secondary.
QUESTION 1 : Do you agree?

I thought that the Patroni nodes do not communicate directly with each other but only by DCS?
QUESTION 2: Is this not correct anymore?

===========================

patroni version: 2.1.3

===========================

Patroni Logfile of Host szhm49346 (IP 10.9.132.13) => Primary until Failover
...
...
2022-09-21 11:13:57,279 DEBUG: API thread: 10.9.132.16 - - "GET /patroni HTTP/1.1" 200 - latency: 2245.090 ms
2022-09-21 11:13:57,378 ERROR:
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/patroni/dcs/etcd.py", line 566, in wrapper
retval = func(self, *args, **kwargs) is not None
File "/usr/lib/python3.6/site-packages/patroni/dcs/etcd.py", line 696, in _update_leader
return self.retry(self._client.write, self.leader_path, self._name, prevValue=self._name, ttl=self._ttl)
File "/usr/lib/python3.6/site-packages/patroni/dcs/etcd.py", line 447, in retry
return retry(*args, **kwargs)
File "/usr/lib/python3.6/site-packages/patroni/utils.py", line 334, in __call__
return func(*args, **kwargs)
File "/usr/lib/python3.6/site-packages/etcd/client.py", line 500, in write
response = self.api_execute(path, method, params=params)
File "/usr/lib/python3.6/site-packages/patroni/dcs/etcd.py", line 257, in api_execute
return self._handle_server_response(response)
File "/usr/lib/python3.6/site-packages/etcd/client.py", line 987, in _handle_server_response
etcd.EtcdError.handle(r)
File "/usr/lib/python3.6/site-packages/etcd/__init__.py", line 306, in handle
raise exc(msg, payload)
etcd.EtcdCompareFailed: Compare failed : [pcl_p011(at)szhm49346 != pcl_p011(at)szhm49345]
2022-09-21 11:13:57,558 WARNING: Exception happened during processing of request from 10.9.132.16:49080
2022-09-21 11:13:57,965 ERROR: failed to update leader lock
2022-09-21 11:13:57,983 INFO: Demoting self (immediate-nolock)
2022-09-21 11:13:58,214 WARNING: Traceback (most recent call last):
File "/usr/lib64/python3.6/socketserver.py", line 654, in process_request_thread
self.finish_request(request, client_address)
File "/usr/lib64/python3.6/socketserver.py", line 364, in finish_request
self.RequestHandlerClass(request, client_address, self)
File "/usr/lib64/python3.6/socketserver.py", line 724, in __init__
self.handle()
File "/usr/lib64/python3.6/http/server.py", line 418, in handle
self.handle_one_request()
File "/usr/lib/python3.6/site-packages/patroni/api.py", line 652, in handle_one_request
BaseHTTPRequestHandler.handle_one_request(self)
File "/usr/lib64/python3.6/http/server.py", line 406, in handle_one_request
method()
File "/usr/lib/python3.6/site-packages/patroni/api.py", line 198, in do_GET_patroni
self._write_status_response(200, response)
File "/usr/lib/python3.6/site-packages/patroni/api.py", line 94, in _write_status_response
self._write_json_response(status_code, response)
File "/usr/lib/python3.6/site-packages/patroni/api.py", line 53, in _write_json_response
self._write_response(status_code, json.dumps(response, default=str), content_type='application/json')
File "/usr/lib/python3.6/site-packages/patroni/api.py", line 50, in _write_response
self.wfile.write(body.encode('utf-8'))
File "/usr/lib64/python3.6/socketserver.py", line 803, in write
self._sock.sendall(b)
BrokenPipeError: [Errno 32] Broken pipe
...
...

===========================

Patroni Logfile of Host szhm49345 (IP 10.9.132.16) => Standby until Failover
...
...
2022-09-21 11:13:54,381 DEBUG: Starting new HTTP connection (1): szhm49346.global.szh.loc:8009
2022-09-21 11:13:56,384 WARNING: Request failed to pcl_p011(at)szhm49346: GET http://szhm49346.global.szh.loc:8009/patroni (HTTPConnectionPool(host='szhm49346.global.szh.loc', port=8009): Max retries exceeded with url: /patroni (Caused by ReadTimeoutError("HTTPConnectionPool(host='szhm49346.global.szh.loc', port=8009): Read timed out. (read timeout=2)",)))
2022-09-21 11:13:56,484 DEBUG: Writing pcl_p011(at)szhm49345 to key /patroni/pcl_p011/leader ttl=30 dir=False append=False
2022-09-21 11:13:56,485 DEBUG: Converted retries value: 0 -> Retry(total=0, connect=None, read=None, redirect=0, status=None)
2022-09-21 11:13:56,562 DEBUG: http://10.7.211.13:2379 "PUT /v2/keys/patroni/pcl_p011/leader HTTP/1.1" 201 197
2022-09-21 11:13:56,562 DEBUG: Issuing read for key /patroni/pcl_p011/ with args {'recursive': True, 'retry': <patroni.utils.Retry object at 0x7fcbb0d0c2b0>}
2022-09-21 11:13:56,563 DEBUG: Converted retries value: 0 -> Retry(total=0, connect=None, read=None, redirect=0, status=None)
2022-09-21 11:13:56,634 DEBUG: http://10.7.211.13:2379 "GET /v2/keys/patroni/pcl_p011/?recursive=true HTTP/1.1" 200 None
2022-09-21 11:13:56,635 DEBUG: Writing {"leader":"pcl_p011(at)szhm49345","sync_standby":null} to key /patroni/pcl_p011/sync ttl=None dir=False append=False
2022-09-21 11:13:56,635 DEBUG: Converted retries value: 0 -> Retry(total=0, connect=None, read=None, redirect=0, status=None)
2022-09-21 11:13:56,713 DEBUG: http://10.7.211.13:2379 "PUT /v2/keys/patroni/pcl_p011/sync HTTP/1.1" 200 368
2022-09-21 11:13:56,713 DEBUG: Writing {"conn_url":"postgres://szhm49345.global.szh.loc:5432/pcl_p011","api_url":"http://szhm49345.global.szh.loc:8009/patroni","state":"running","role":"replica","version":"2.1.3","checkpoint_after_promote":false,"xlog_location":9087609453816,"timeline":6} to key /patroni/pcl_p011/members/pcl_p011(at)szhm49345 ttl=30 dir=False append=False
2022-09-21 11:13:56,714 DEBUG: Converted retries value: 0 -> Retry(total=0, connect=None, read=None, redirect=0, status=None)
2022-09-21 11:13:56,791 DEBUG: http://10.7.211.13:2379 "PUT /v2/keys/patroni/pcl_p011/members/pcl_p011(at)szhm49345 HTTP/1.1" 200 896
2022-09-21 11:13:56,792 INFO: promoted self to leader by acquiring session lock
2022-09-21 11:13:56,798 INFO: cleared rewind state after becoming the leader
...
...

Browse pgsql-general by date

  From Date Subject
Next Message Ron 2022-09-22 11:18:06 Re: PCI-DSS Requirements
Previous Message Laurenz Albe 2022-09-22 07:59:28 Re: [EXT] pg_stat_activity.backend_xmin