Il 13/01/25 18:26, Adrian Klaver ha scritto:
> On 1/13/25 08:59, Enrico Schenone wrote:
>>
>> Il 13/01/25 17:19, Adrian Klaver ha scritto:
>>> On 1/13/25 00:45, Enrico Schenone wrote:
>>>> Hello, Adrian.
>>>> As I said days ago, I have arranged a kind of stress test in
>>>> production environment.
>>>> I wrote a program that loads a temporary table, loads 2049 rows
>>>> into them from a baseline_table and finally declare two nested
>>>> cursors.
>>>> The first cursor is on the temp table as parent while the second is
>>>> on a lookup table as child.
>>>>
>>>> The program logic is the transposition of one fragment of several
>>>> production programs that was failing on cursors, and has to be
>>>> intended as a POC only.
>>>>
>>>
>>>
>>>> And Well, I'm quite confused: no error at all has been detected,
>>>> not only on the test programs but in the whole production system.
>>>> The error was completely disappeared.
>>>>
>>>> Then I have stopped the four tasks of the stress test leaving all
>>>> other services running for a week, and again no error at all.
>>>>
>>>> No setup was changed nor servers was rebooted, nor infrastructure
>>>> has been upgraded during the test period.
>>>
>>> You are absolutely sure about the above?
>> I can say Yes. All test operations has been logged and verified
>> against the Postgresql log.
>> The only component not under my control is the Provider's
>> Infrastructure, but the infrastructure admin ensured me that no
>> operation at all has been made. I beleave him because it is a
>> reliable tecnician end a well known person.
>
> In your OP you stated:
>
> "Production environments can be:
>
> * Distinct application server and DB server on distinct subnets (no
> dropped packet detected on firewall, no memory/disk/network failure
> detected by "nmon" tool)
> * Distinct application server and DB server on same subnet (no
> firewall)
> * Same server for PostgreSQL and applications
> "
>
> In all those cases are the various servers all running completely
> within the providers infrastructure?
>
No, the second production environment is On Premises at the customer
Datacenter under the same vmware hypervisor.
I'll make more investigations on second and third environments by
increasing the verbosity of both DB and Application logs.
>
>>> Errors that 'fix' themselves are the most frustrating kind, as you
>>> know in the back of your mind they will likely pop up again.
>> True, knocking again to my door ... I still can't beleave.
>
> Going forward one of three things are likely to happen:
>
> 1) The error never shows again.
>
> 2) It does show up again but in a manner that allows it to be traced.
>
> 3) The worst case, it plays hide and seek as previously.
>
>
In our standard applications it normally plays hide because normally our
programs doesn't apply TRY/CATCH paradigm on FETCH.
In 4Js Genero BDL a fetch can return only "success" or "not found"
status, so normally we don't catch for any exception unless we want to
detect locks.
Furthermore for standard cursors we don't use the OPEN/FETCH/CLOSE
statements but simply a FOREACH ... END FOREACH statement that
masquerades in a more easy way the OPEN/FETCH/CLOSE process.
We detected the problem looking at Postgresql log and investigating at
client side for any program exception or abort in FOREACH/END FOREACH
block by increasing log verbosity and applying the OPEN/FETCH/CLOSE and
TRY/CATCH paradigm on suspect cursors.
>
>> Thanks a lot for your interest in sharing my strange experience.
>> Best regards.
>> Enrico
>>
>> *Enrico Schenone*
>> Software Architect
>
*Enrico Schenone*
Software Architect