Re: Mysterious performance degradation in exceptional cases

From: Matthias Apitz <guru(at)unixarea(dot)de>
To: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
Cc: Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Mysterious performance degradation in exceptional cases
Date: 2022-09-15 05:33:49
Message-ID: YyK5PTCoSFXUzOs5@c720-r368166
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

El día miércoles, septiembre 14, 2022 a las 07:19:31a. m. -0700, Adrian Klaver escribió:

> On 9/14/22 01:31, Matthias Apitz wrote:
> >
> > We have a C-written application server which uses ESQL/C on top
> > of PostgreSQL 13.1 on Linux. The application in question always serves
> > the same search in a librarian database, given to the server
> > as commands over the network, login into the application and doing
> > a search:
> >
> > SLNPServerInit
> > User:zfl
> > SLNPEndCommand
> >
> > SLNPSearch
> > HitListName:Zfernleihe
> > Search:1000=472214284
> > SLNPEndCommand
> >
> > To fulfill the search, the application server has to do some 100
> > ESQL/C calls and all this should not take longer than 1-2 seconds, and
> > normally it does not take longer. But, in some situations it takes
> > longer than 180 seconds, in 10% of the cases. The other 90% are below 2 seconds,
> > i.e. this is digital: Or 2 seconds, or more than 180 seconds, no values between.
> >
> > We can easily simulate the above with a small shell script just sending over
> > the above two commands with 'netcat' and throwing away its result (the real search is
> > done by an inter library loan software which has an timeout of 180 seconds
> > to wait for the SLNPSearch search result -- that's why we got to know
> > about the problem at all, because all this is running automagically with
> > no user dialogs). The idea of the simulated search was to get to know
> > with the ESQL/C log files which operation takes so long and why.
>
> Does the test search run the inter library loan software?

The real picture is:

ILL-software --(network, search command)---> app-server --(ESQL/C)--> PostgreSQL-server
test search --(localhost, search command)-> app-server --(ESQL/C)--> PostgreSQL-server

> > Well, since some day, primary to catch the situation, we send over every
> > 10 seconds this simulated searches and since then the problem went away at all.
>
> To be clear the problem went away for the real search?

Yes, since the 'test search' runs every 10 seconds, the above pictured
'ILL-software', doing the same search, does not face the problem anymore.

>
> Where is the inter library software, in your application or are you reaching
> out to another application?

The above 'app-server' fulfills the search requested by the
'ILL-software' (or the 'test search'), i.e. looks up for one single
librarian record (one row in the PostgreSQL database) and delivers
it to the 'ILL-software'. The request from the 'ILL-software' is not
a heavy duty, more or less 50 requests per day.

> Is the search running across a remote network?

The real search comes over the network through a stunnel. But we
watched with tcpdump the incoming search and the response by the
'app-server' locally. In the case of the timeout, the 'app-server' does not
answer within 180 seconds, i.e. does not send anything into the stunnel,
and the remote 'ILL-software' terminates the connection with an F-packet.

I will now:

- shutdown the test search every 10 secs to see if the problem re-appears
- set 'log_autovacuum_min_duration = 0' in postgresql.conf to see if
the times of the problem matches;

Thanks for your feedback in any case.

matthias

--
Matthias Apitz, ✉ guru(at)unixarea(dot)de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Laurenz Albe 2022-09-15 08:22:30 Re: Re[2]: CVE-2022-2625
Previous Message Tom Lane 2022-09-15 04:38:58 Re: Is it possible to stop sessions killing eachother when they all authorize as the same role?