From: | Vicente Juan Tomas Monserrat <vicens(dot)tomas(at)bluekiri(dot)com> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | connection timeout with psycopg2 |
Date: | 2019-10-16 09:29:04 |
Message-ID: | CAK525fiv21UDbzDe7BsHEzzAA9ByBGKqCKU0et7NJws2jS50Cw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hi there,
I have been testing out the following architecture for PostgreSQL HA.
+---------+
+-----+ VIP +----+
| +---------+ |
| |
+------v-------+ +------v-------+
| pgBouncer | | pgBouncer |
| + | | + |
| keepalived | | keepalived |
+------+-------+ +------+-------+
| |
| |
| |
+------v-------+ +------v-------+
| | | |
| HAProxy | | HAProxy |
| | | |
+------+-------+ +------+-------+
| |
+--------------------+
| |
| |
+----v----+ +----v----+
| | | |
| | | |
| PG01 | | PG02 |
| | | |
|(patroni)| |(patroni)|
| | | |
+---------+ +---------+
I'm using this python script for checking the failover events in pgBouncer,
HAProxy and Patroni (PostgreSQL HA solution).
#! /usr/bin/env python
# -*- coding: utf-8 -*-
# vim:fenc=utf-8
import psycopg2
ISOLEVEL = psycopg2.extensions.ISOLATION_LEVEL_AUTOCOMMIT
import time
from datetime import datetime
user = 'postgres'
password = 'secretpassword'
host = 'localhost'
port = '6432'
database = 'test'
LIMIT_RETRIES = 10
class DB():
def __init__(self, user, password, host, port, database, reconnect):
self.user = user
self.password = password
self.host = host
self.port = port
self.database = database
self._connection = None
self._cursor = None
self.reconnect = reconnect
self.init()
def connect(self,retry_counter=0):
if not self._connection:
try:
self._connection = psycopg2.connect(user = self.user,
password = self.password, host = self.host, port = self.port, database
= self.database, connect_timeout = 3)
retry_counter = 0
self._connection.autocommit = True
return self._connection
except psycopg2.OperationalError as error:
if not self.reconnect or retry_counter >= LIMIT_RETRIES:
raise error
else:
retry_counter += 1
print("got error {}. reconnecting
{}".format(str(error).strip(), retry_counter))
time.sleep(5)
self.connect(retry_counter)
except (Exception, psycopg2.Error) as error:
raise error
def cursor(self):
if not self._cursor or self._cursor.closed:
if not self._connection:
self.connect()
self._cursor = self._connection.cursor()
return self._cursor
def execute(self, query, retry_counter=0):
try:
self._cursor.execute(query)
retry_counter = 0
except (psycopg2.DatabaseError, psycopg2.OperationalError) as error:
if retry_counter >= LIMIT_RETRIES:
raise error
else:
retry_counter += 1
print("got error {}. retrying
{}".format(str(error).strip(), retry_counter))
time.sleep(1)
self.reset()
self.execute(query, retry_counter)
except (Exception, psycopg2.Error) as error:
raise error
def reset(self):
self.close()
self.connect()
self.cursor()
def close(self):
if self._connection:
if self._cursor:
self._cursor.close()
self._connection.close()
print("PostgreSQL connection is closed")
self._connection = None
self._cursor = None
def init(self):
self.connect()
self.cursor()
db = DB(user=user, password=password, host=host, port=port,
database=database, reconnect=True)
db.execute("create table if not exists t1 (id integer);")
i = 0
while True:
db.execute("insert into t1(id) values(1);")
if i % 100 == 0:
print("%s: %d" % (datetime.now(), i))
i = i+1
When running this python script against the pgBouncer VIP it keeps
inserting data into the database. Then I stop one of the HAProxy
service (where the VIP lives) the connection it hangs and never goes on.
The VIP is on the other node but the client/app it doesn't notice and it
keeps waiting for 5 minutes and finally continues. I've been looking for
some default value of 5min with no luck.
$ python insert.py
2019-10-15 10:01:51.817585: 0
2019-10-15 10:01:51.901091: 100
2019-10-15 10:01:52.031583: 200
2019-10-15 10:01:52.126565: 300
2019-10-15 10:01:52.216502: 400
2019-10-15 10:01:52.307157: 500
2019-10-15 10:01:52.400867: 600
2019-10-15 10:01:52.497239: 700
2019-10-15 10:01:52.655689: 800
2019-10-15 10:01:52.777883: 900
got error server conn crashed? <<<<<<<<<<<<<<<<< HAProxy
stopped manually to force the VIP to move to the other node
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.. retrying 1
PostgreSQL connection is closed
^C^C^C^C^C <<<<<<<<<<<<<<<<<< The
connection gets stuck (kill PID)
I've tried exactly the same code logic in Java (using PostgreSQL JDBC) and
dotnet core (using Npgsql) works fine with specifying this parameters
socketTimeout (Java) and 'Command Timeout' (dotnet) respectively.
$ dotnet run
connection initialized
2019-10-15T08:27:28.843 0
2019-10-15T08:27:30.205 100
2019-10-15T08:27:31.566 200
got error: Exception while reading from stream. Retrying 1
connection closed
connection initialized
connection reset
2019-10-15T08:27:42.076 300 <<<<<<<<<<<< the app notices
something wrong with the existing connection and does a reset
2019-10-15T08:27:43.461 400
2019-10-15T08:27:44.843 500
2019-10-15T08:27:46.244 600
2019-10-15T08:27:47.637 700
2019-10-15T08:27:49.031 800
^C
In python and using psycopg2 (v2.8.3) I've not been able to inform in any
way the application to reset the connection, retry and keep going on.
As the psycopg2 it's a wrapper for libpq, I've seen that in libpq (for
postgresql 12) there is a new option called tcp_user_timeout. Note this
options doesn't exist in previous versions of libpq.
So I've tried to install psycopg2 (instead of psycopg2-binary) with the
libpq (v12) installed, so the psycopg2 is build against libpq (v12). Even
so, it's not working as expected.
Environment:
Ubuntu 16.04 LTS
PostgreSQL 9.6.15
Patroni 1.6.0
pgBouncer 1.11.0
keepalived 2.0.18
HAProxy 1.6.3
I've also tried to catch different types of exceptions with no luck.
I would appreciate any guidance on this matter. I can give you more info if
needed.
Thank!
From | Date | Subject | |
---|---|---|---|
Next Message | Luca Ferrari | 2019-10-16 09:38:53 | Re: Regarding db dump with Fc taking very long time to completion |
Previous Message | Durgamahesh Manne | 2019-10-16 09:27:00 | Re: Regarding db dump with Fc taking very long time to completion |