From: | Enrico Schenone <eschenone(at)cleistech(dot)it> |
---|---|
To: | Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>, pgsql-general(at)lists(dot)postgresql(dot)org |
Cc: | Massimo Catti <mcatti(at)cleistech(dot)it>, Livio Pizzolo <lpizzolo(at)cleistech(dot)it> |
Subject: | Re: Intermittent errors when fetching cursor rows on PostgreSQL 16 |
Date: | 2025-01-13 08:45:53 |
Message-ID: | 382a1eec-2069-4010-bbdb-37260a1a53a7@cleistech.it |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hello, Adrian.
As I said days ago, I have arranged a kind of stress test in production
environment.
I wrote a program that loads a temporary table, loads 2049 rows into
them from a baseline_table and finally declare two nested cursors.
The first cursor is on the temp table as parent while the second is on a
lookup table as child.
The program logic is the transposition of one fragment of several
production programs that was failing on cursors, and has to be intended
as a POC only.
The program has been wrote in both pure C with libpq (see attached
source program) and in 4Js Genero language.
Each program was executed by a shell script loop who ran 10 times the
program each minute with 1 second sleep between each run (see attachment).
An automatic scheduler has continuously submitted 4 parallel tasks (two
for C version and two for 4Js version programs).
The test was started the Dec, 29 2024 and it was kept in execution for
many days directly in production environment.
In total, nearly a billion of child test cursors were executed while all
other production tasks was running (normally 20 to 30 concurrent batch
services on a pool of 100).
And Well, I'm quite confused: no error at all has been detected, not
only on the test programs but in the whole production system. The error
was completely disappeared.
Then I have stopped the four tasks of the stress test leaving all other
services running for a week, and again no error at all.
No setup was changed nor servers was rebooted, nor infrastructure has
been upgraded during the test period.
As a result, at the moment I'm not understood not only Why & Where the
error was occurring, but also Why it is disappeared.
Anyone may feel free to give me his opinion.
For the moment I'll make no other test unless the error is knocking back
to my door.
*Enrico Schenone*
Software Architect
*Cleis Tech s.r.l.* - www.gruppocleis.it
Sede di Genova, Via Paolo Emilio Bensa, 2 - 16124 Genova, ITALY
Tel: +39-0104071400 Fax: +39-0104073276
Mobile: +39-320
7709352file:///home/enrico/Documenti/Work/Clienti/hh24/Incident/err-6372/C-test/C-testCursors.c
E-mail: eschenone(at)cleistech(dot)it
<https://gruppocleis.it><https://ibm.biz/BdqAJh>
<https://ibm.biz/BdqAJh>
<https://ibm.biz/BdqAJh>
Il 26/12/24 00:20, Adrian Klaver ha scritto:
> On 12/24/24 14:23, Enrico Schenone wrote:
>> Hi, Adrian.
>> I'm arranging a test program with two nested cursors in two versions:
>>
>> 1. 4Js Genero BDL language
>> 2. pure C with libpq language
>>
>> I'll put both programs in stress execution into the production
>> environment looking for some hours how they behaves.
>> Possible combinations are:
>>
>> 1. no-one throws an error
>> 2. only the 4Js Genero version throws an error
>> 3. only the pure C version throws an error
>> 4. both versions throws the error
>>
>> This stress test should address further investigations.
>> I'll keep you informed.
>
> Yes, would like to see how this turns out.
>
>>
>> Regards.
>> Enrico Schenone
>>
>
>
Attachment | Content-Type | Size |
---|---|---|
C-testCursors.c | text/x-csrc | 5.2 KB |
C-testCursors.sh | application/x-shellscript | 279 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Torsten Krah | 2025-01-13 09:34:46 | Re: could not open file "base/XX/XX": Interrupted system call |
Previous Message | Peter J. Holzer | 2025-01-13 00:17:20 | Re: Automatic upgrade of passwords from md5 to scram-sha256 |