Re: connection timeout with psycopg2

From: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
To: Vicente Juan Tomas Monserrat <vicens(dot)tomas(at)bluekiri(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: connection timeout with psycopg2
Date: 2019-10-18 14:20:12
Message-ID: 98404060-71d7-f4c6-9059-9d7dd4800a26@aklaver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 10/16/19 2:29 AM, Vicente Juan Tomas Monserrat wrote:
> Hi there,
>
> I have been testing out the following architecture for PostgreSQL HA.
>
> |+---------+ +-----+ VIP +----+ | +---------+ | | | +------v-------+
> +------v-------+ | pgBouncer | | pgBouncer | | + | | + | | keepalived |
> | keepalived | +------+-------+ +------+-------+ | | | | | |
> +------v-------+ +------v-------+ | | | | | HAProxy | | HAProxy | | | |
> | +------+-------+ +------+-------+ | | +--------------------+ | | | |
> +----v----+ +----v----+ | | | | | | | | | PG01 | | PG02 | | | | |
> |(patroni)| |(patroni)| | | | | +---------+ +---------+ |
>
> I'm using this python script for checking the failover events in
> pgBouncer, HAProxy and Patroni (PostgreSQL HA solution).
>
> |#! /usr/bin/env python # -*- coding: utf-8 -*- # vim:fenc=utf-8 import
> psycopg2 ISOLEVEL = psycopg2.extensions.ISOLATION_LEVEL_AUTOCOMMIT
> import time from datetime import datetime user = 'postgres' password =
> 'secretpassword' host = 'localhost' port = '6432' database = 'test'
> LIMIT_RETRIES = 10 class DB(): def __init__(self, user, password, host,
> port, database, reconnect): self.user = user self.password = password
> self.host = host self.port = port self.database = database
> self._connection = None self._cursor = None self.reconnect = reconnect
> self.init() def connect(self,retry_counter=0): if not self._connection:
> try: self._connection = psycopg2.connect(user = self.user, password =
> self.password, host = self.host, port = self.port, database =
> self.database, connect_timeout = 3) retry_counter = 0
> self._connection.autocommit = True return self._connection except
> psycopg2.OperationalError as error: if not self.reconnect or
> retry_counter >= LIMIT_RETRIES: raise error else: retry_counter += 1
> print("got error {}. reconnecting {}".format(str(error).strip(),
> retry_counter)) time.sleep(5) self.connect(retry_counter) except
> (Exception, psycopg2.Error) as error: raise error def cursor(self): if
> not self._cursor or self._cursor.closed: if not self._connection:
> self.connect() self._cursor = self._connection.cursor() return
> self._cursor def execute(self, query, retry_counter=0): try:
> self._cursor.execute(query) retry_counter = 0 except
> (psycopg2.DatabaseError, psycopg2.OperationalError) as error: if
> retry_counter >= LIMIT_RETRIES: raise error else: retry_counter += 1
> print("got error {}. retrying {}".format(str(error).strip(),
> retry_counter)) time.sleep(1) self.reset() self.execute(query,
> retry_counter) except (Exception, psycopg2.Error) as error: raise error
> def reset(self): self.close() self.connect() self.cursor() def
> close(self): if self._connection: if self._cursor: self._cursor.close()
> self._connection.close() print("PostgreSQL connection is closed")
> self._connection = None self._cursor = None def init(self):
> self.connect() self.cursor() db = DB(user=user, password=password,
> host=host, port=port, database=database, reconnect=True)
> db.execute("create table if not exists t1 (id integer);") i = 0 while
> True: db.execute("insert into t1(id) values(1);") if i % 100 == 0:
> print("%s: %d" % (datetime.now(), i)) i = i+1 |
>
> When running this python script against the pgBouncer VIP it keeps
> inserting data into the database. Then I stop one of the HAProxy
> service (where the VIP lives) the connection it hangs and never goes on.
> The VIP is on the other node but the client/app it doesn't notice and it
> keeps waiting for 5 minutes and finally continues. I've been looking for
> some default value of 5min with no luck.

Observations and comments:
1) I would point out there is a Psycopg list:

https://www.postgresql.org/list/psycopg/

2) I am not sure where the 5 minutes comes in. I see LIMIT_RETRIES = 10
and a sleep of 5 sec between retries.

3) Where did this:

"got error server conn crashed? "

come from? I don't see that in the code.

>
> |$ python insert.py 2019-10-15 10:01:51.817585: 0 2019-10-15
> 10:01:51.901091: 100 2019-10-15 10:01:52.031583: 200 2019-10-15
> 10:01:52.126565: 300 2019-10-15 10:01:52.216502: 400 2019-10-15
> 10:01:52.307157: 500 2019-10-15 10:01:52.400867: 600 2019-10-15
> 10:01:52.497239: 700 2019-10-15 10:01:52.655689: 800 2019-10-15
> 10:01:52.777883: 900 got error server conn crashed? <<<<<<<<<<<<<<<<<
> HAProxy stopped manually to force the VIP to move to the other node
> server closed the connection unexpectedly This probably means the server
> terminated abnormally before or while processing the request.. retrying
> 1 PostgreSQL connection is closed ^C^C^C^C^C <<<<<<<<<<<<<<<<<< The
> connection gets stuck (kill PID) |
>
> I've tried exactly the same code logic in Java (using PostgreSQL JDBC)
> and dotnet core (using Npgsql) works fine with specifying this
> parameters socketTimeout (Java) and 'Command Timeout' (dotnet) respectively.
>
> |$ dotnet run connection initialized 2019-10-15T08:27:28.843 0
> 2019-10-15T08:27:30.205 100 2019-10-15T08:27:31.566 200 got error:
> Exception while reading from stream. Retrying 1 connection closed
> connection initialized connection reset 2019-10-15T08:27:42.076 300
> <<<<<<<<<<<< the app notices something wrong with the existing
> connection and does a reset 2019-10-15T08:27:43.461 400
> 2019-10-15T08:27:44.843 500 2019-10-15T08:27:46.244 600
> 2019-10-15T08:27:47.637 700 2019-10-15T08:27:49.031 800 ^C |
>
> In python and using psycopg2 (v2.8.3) I've not been able to inform in
> any way the application to reset the connection, retry and keep going on.
>
> As the psycopg2 it's a wrapper for libpq, I've seen that in libpq (for
> postgresql 12) there is a new option called |tcp_user_timeout|. Note
> this options doesn't exist in previous versions of libpq.
> So I've tried to install psycopg2 (instead of psycopg2-binary) with the
> libpq (v12) installed, so the psycopg2 is build against libpq (v12).
> Even so, it's not working as expected.
>
> Environment:
> Ubuntu 16.04 LTS
> PostgreSQL 9.6.15
> Patroni 1.6.0
> pgBouncer 1.11.0
> keepalived 2.0.18
> HAProxy 1.6.3
>
> I've also tried to catch different types of exceptions with no luck.
>
> I would appreciate any guidance on this matter. I can give you more info
> if needed.
>
> Thank!
>

--
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Matthias Apitz 2019-10-18 14:42:26 Re: DBD::Pg exorts char columns with trailing blanks
Previous Message Lizeth Solis Aramayo 2019-10-18 13:07:23 RE: CVE-2018-1058