Intermittent errors when fetching cursor rows on PostgreSQL 16

From: Enrico Schenone <eschenone(at)cleistech(dot)it>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Cc: Massimo Catti <mcatti(at)cleistech(dot)it>, Livio Pizzolo <lpizzolo(at)cleistech(dot)it>
Subject: Intermittent errors when fetching cursor rows on PostgreSQL 16
Date: 2024-12-17 16:30:16
Message-ID: 446423eb-4a4e-4135-bbb8-4d0e5c7aac3b@cleistech.it
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Good day.
My name is Enrico Schenone, from Genoa, Italy.
I'm a software achitect working at Cleis Tech - Genoa - Italy  -
http://gruppocleis.it
Me and my team are using PostgreSQL v12 to v16 on Debian 10-12 and
Ubuntu Server 22.04 LTS with no-cluster configuration.
Our applications are developed with 4Js Genero platform (classified as
LCAP) - https://4js.com

I whish to report an issue where I can't say if it happens at server or
client side (or both as well).
The problem occurs occasionally and only when fetching rows from a
server-side cursor. The related query may be complex with joins or very
easy (just one static table with 86 rows without WHERE conditions).
I have set the "debug5" verbosity level of PostgreSQL log and I have
extracted from millions of log line those who are belonging to separate
failing sessions/connections.
At the same time I have extracted the related application log.
For each failure reported into the client-side application log, I have a
distinct PostgreSQL detailed log.

Then I have merged client-side end server-side logs along the timeline
and I have observed what client and server does.
For example (S: means PostgreSQL Server log, while C: means Client log):

|S||:|
|||2024-12-16 17:27:14.*406* CET [2214722] cleistech(at)hh24odds_prod -
192.168.16.179000006*76054e0.21cb42* STATEMENT: *fetch forward 50 from cu6*

||C:|
|ERROR  ; 2024-12-16 17:27:14.*407*; PID: 104257;  User: genero; Ricerca
quote evento     1433958 fallita. General SQL error, check
SQLCA.SQLERRD[2]. - *SQLSTATE: XX001*|
|
S:|
|2024-12-16 17:27:14.*407* CET [2214722] cleistech(at)hh24odds_prod -
192.168.16.17908006*676054e0.21cb42* LOG: ||*08006*||*: could not
receive data from client*: Connection reset by peer |
|2024-12-16 17:27:14.*407* CET [2214722] cleistech(at)hh24odds_prod -
192.168.16.17908006*676054e0.21cb42* LOCATION:  pq_recvbuf, pqcomm.c:953 |
|2024-12-16 17:27:14.*407* CET [2214722] cleistech(at)hh24odds_prod -
192.168.16.17908003*676054e0.21cb42* DEBUG: *08003*: unexpected EOF on
client connection

||C:|
|ERROR  ; 2024-12-16 17:27:14.*408*; PID: 104257;  User: genero;
ver_quote: ERRORE in foreach ricerca bettype con spread. *SQLSTATE:
XX000* - SQLCODE:       -6372 -          -1 - *no connection to the
server* - abbandono validazione. |

Before failing on the reported cursor, the program succesfully creates
and uses other cursors.
When the issue is detected at client-side, the program terminates
without any abort and it is re-instantiated in seconds or minutes by a
scheduler.
After a variable time (normally some minutes) and several failures it
goes to normal end without errors.

What I reported in the body of this mail is only a subset of postgreSQL
and application logs. I can send several log files each reporting a
distinct and complete connection ID history.
I tried to reproduce the issue on a "in-vitro" environment, with
single-to-multiple instances of the same program (up to 99 parallel
instances) and I have executed more than half million of test processes
without errors.
This problem commonly happens only in production environments.

Production environments can be:

* Distinct application server and DB server on distinct subnets (no
dropped packet detected on firewall, no memory/disk/network failure
detected by "nmon" tool)
* Distinct application server and DB server on same subnet (no firewall)
* Same server for PostgreSQL and applications

The VM running the PostgreSQL that I have used for my test is an Ubuntu
Server 22.04 LTS with 16 CPUs and 64 GB of RAM.
For client applications I use Ubuntu Server 22.04 LTS.
The postgresql.conf file is attached to this e-mail.

I'm able to detect that there is an error but I really becomes mad
trying to find it. It seems a phantom that I know to exist but I can't
bring up.
I kindly ask you to help me understand what and where is the problem,
and how to solve it.

Hoping you can help me or address to someone who can do it.
Thanks in advance.
Enrico
--

*Enrico Schenone*
Software Architect

*Cleis Tech s.r.l.* - www.gruppocleis.it
Sede di Genova, Via Paolo Emilio Bensa, 2 - 16124 Genova, ITALY
Tel: +39-0104071400   Fax: +39-0104073276
Mobile: +39-320 7709352
E-mail: eschenone(at)cleistech(dot)it

<https://gruppocleis.it><https://ibm.biz/BdqAJh>

<https://ibm.biz/BdqAJh>
<https://ibm.biz/BdqAJh>

Attachment Content-Type Size
postgresql.conf text/plain 29.6 KB

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Will Storey 2024-12-17 18:03:44 Re: Disabling vacuum truncate for autovacuum
Previous Message Rama Krishnan 2024-12-17 13:41:55 Query about pg_wal directory filled up