Re: Momentary Delay

From: Anderson Valadares <andervalbh(at)gmail(dot)com>
To: Bill Moran <wmoran(at)potentialtech(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Momentary Delay
Date: 2015-06-15 20:48:54
Message-ID: CALaY=Xmsd99L51pUS2w4RmBZyji=KuqxsXnQHGe039Usdnd=DA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

2015-06-12 19:56 GMT-03:00 Bill Moran <wmoran(at)potentialtech(dot)com>:

>
> Please do not remove the mailing list from replies. See below.
>
> On Fri, 12 Jun 2015 09:21:19 -0300
> Anderson Valadares <andervalbh(at)gmail(dot)com> wrote:
>
> > 2015-06-08 20:33 GMT-03:00 Bill Moran <wmoran(at)potentialtech(dot)com>:
> >
> > > On Mon, 8 Jun 2015 11:59:31 -0300
> > > Anderson Valadares <andervalbh(at)gmail(dot)com> wrote:
> > >
> > > > Hi
> > > > We are experiencing an intermittent problem in a GIS database from a
> > > > client. Some times of the day is as if the PostgreSQL executed the
> > > slowest
> > > > operations see below an example. The query was performed three times,
> > > twice
> > > > I canceled and the third time I left it to end. The consultation
> took 10
> > > > seconds to finish, but the measured execution time is 20 ms. As we
> have
> > > > several queries that run every 2 seconds when this momentary delay
> occurs
> > > > queries accumulate and burst the limit of 203 connections allowed.
> The
> > > > interval of "momentary stop" are approximately 2 seconds occurring at
> > > > random times and during these stoppages occur no locks and no
> increased
> > > > load on the server is identified. There is a job 2/2 seconds locks
> > > > collection of information, running queries, etc., nmon also collects
> > > > information every 2 seconds. My client asks what causes these
> momentary
> > > > stops? because it affects all operations of the database? etc. How
> do I
> > > > identify what is causing these delays in executions of operations in
> the
> > > > database?
> > >
> > > Number of points to consider:
> > > * Check the contents of pg_locks and see if something is waiting on a
> > > lock when the query is slow.
> >
> > There is nothing waiting when the query is slow, see:
> >
> > snap_datetime | waiting | count
> > -------------------------------+---------+-------
> > 2015-06-05 09:25:00.954731-03 | f | 74
> > 2015-06-05 09:26:00.249187-03 | f | 205
> > 2015-06-05 09:27:00.826874-03 | f | 207
>
> I don't know what that means. Since you don't show the query that
> generated that output, I have no idea if your statement is valid, or
> if you're running a query that will inherently produce incorrect
> results.
>

Sorry the information was incomplete. Below is the query that was
performed to extract information from tables that used to monitor the
database.
The snap_datetime column indicates the time that occurred monitoring,
the waiting column tells you whether any connection was on hold and the
column
count tells how many connections existed at the time.

select
s.snap_datetime
,a.waiting
, count(*)
from stat_snapshot s, stat_activity a
where s.snap_id = a.snap_id
and s.snap_datetime >= '2015-06-05 09:25:00'
and s.snap_datetime <= '2015-06-05 09:28:00'
group by
s.snap_datetime
,a.waiting
order by s.snap_datetime
;
snap_datetime | waiting | count
-------------------------------+---------+-------
2015-06-05 09:25:00.954731-03 | f | 74
2015-06-05 09:26:00.249187-03 | f | 205
2015-06-05 09:27:00.826874-03 | f | 207
(3 rows)

> > * Also, ensure log_lock_waits is turned on for the server and check
> > > PostgreSQL's logs for messages about long lock waits.
> >
> > There is also nothing
>
> This is even worse than the previous statement. What did you _do_ to
> determine this?
>
> Since you're unable to diagnose and fix the problem on your own, the
> possibility exists that the problem is with your diagnostic steps. If
> you don't share the details of those steps, I have no way to help you
> with them.
>

See the log in
https://github.com/andervalbh/serv-gis01/blob/master/postgresql-2015-06-05_000000.log

> > > * Based on the connection behavior you describe, I'm guessing it's a
> > > Tomcat app using some sort of conection pool. Whatever it is, tune
> > > your connection pool settings so that the max size of the pool
> doesn't
> > > exceed the available PostgreSQL connections. At least that will
> prevent
> > > errors from happening when the problem occurs.
> >
> > Yes it is a tomcat application using connection pooling and will be set
> to
> > not exceed the limit of PostgreSQL connections, but the central point is
> > that the number of connections has increased due to what I am calling
> > "momentary stoppages"
>
> I understand that. I made the suggestion as a temporary fix to reduce
> the impact until you are able to fix the actual cause.
>
> > > * The query you you keep cancelling below, run just EXPLAIN on it (not
> > > EXPLAIN ANALYZE) which will certainly finish and give you a plan
> > > that can be reviewed to help determine what the problem is.
> >
> > Explain plan in http://explain.depesz.com/s/bWw
>
> I'm confused. The query at that URL only took 10ms to complete, which
> doesn't seem at all unreasonable.
>
>
I agree with you and that's why I ask why the delay in the execution at
certain times
as demonstrated above.

> > * On the OS, monitor iostat or something else to see if you're
> > > saturating disk capacity when the problem happens.
> >
> > The OS is also being monitored and there occurs no saturation.
>
> Again, without any details, it's possible that there is a problem
> in your monitoring.
>
>
See the monitoring spreadsheet generated from nmon in
https://github.com/andervalbh/serv-gis01/blob/master/transfacil07_150605_0000.nmon.rar

>
> > > The Server is a IBM P720 128G RAM PostgreSQL 9.2.9 on
> > > > powerpc-ibm-aix7.1.0.0, compiled by gcc (GCC) 4.6.4, 64-bit
> > > >
> > > > Evolution of the number of connections for a period
> > > >
> > > > snap_datetime | #connections
> > > > ----------------------------------------------+-------------
> > > > 2015-06-05 09:25:00.954731-03 | 74
> > > > 2015-06-05 09:26:00.249187-03 | 205
> > > > 2015-06-05 09:27:00.826874-03 | 207
> > > > 2015-06-05 09:28:00.374666-03 | 73
> > > > 2015-06-05 09:29:00.690696-03 | 75
> > > >
> > > >
> > > > Occurrence of the problem
> > > >
> > > > citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as
> > > dvi_cod_dados_via_iti,
> > > > dvi_sub.via_cod_viagem
> > > > from gis_dvi_dados_viagem_itinerario dvi_sub,
> gis_via_viagem
> > > > via_sub, gis_iti_itinerario iti_sub
> > > > where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
> > > > and via_sub.via_status_viagem = 'A'
> > > > and via_sub.via_dt_hora_ini > now() - interval '9
> hours'
> > > > and iti_sub.lin_cod_linha = 389
> > > > and iti_sub.iti_sentido_itinerario = 'I'
> > > > and iti_sub.iti_cod_itinerario_linha =
> > > > via_sub.iti_cod_itinerario_linha
> > > > and dvi_sub.vlo_cod_localizacao is not null
> > > > group by dvi_sub.via_cod_viagem,
> > > > iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
> > > > ;
> > > > Cancel request sent
> > > > ERROR: canceling statement due to user request
> > > > Time: 2555.557 ms
> > > > citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as
> > > dvi_cod_dados_via_iti,
> > > > dvi_sub.via_cod_viagem
> > > > from gis_dvi_dados_viagem_itinerario dvi_sub,
> gis_via_viagem
> > > > via_sub, gis_iti_itinerario iti_sub
> > > > where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
> > > > and via_sub.via_status_viagem = 'A'
> > > > and via_sub.via_dt_hora_ini > now() - interval '9
> hours'
> > > > and iti_sub.lin_cod_linha = 389
> > > > and iti_sub.iti_sentido_itinerario = 'I'
> > > > and iti_sub.iti_cod_itinerario_linha =
> > > > via_sub.iti_cod_itinerario_linha
> > > > and dvi_sub.vlo_cod_localizacao is not null
> > > > group by dvi_sub.via_cod_viagem,
> > > > iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
> > > > ;
> > > > Cancel request sent
> > > > ERROR: canceling statement due to user request
> > > > Time: 2192.572 ms
> > > > citgis=# explain analyze select max(dvi_sub.dvi_cod_dados_via_iti) as
> > > > dvi_cod_dados_via_iti,
> > > > dvi_sub.via_cod_viagem
> > > > from gis_dvi_dados_viagem_itinerario dvi_sub,
> gis_via_viagem
> > > > via_sub, gis_iti_itinerario iti_sub
> > > > where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
> > > > and via_sub.via_status_viagem = 'A'
> > > > and via_sub.via_dt_hora_ini > now() - interval '9
> hours'
> > > > and iti_sub.lin_cod_linha = 389
> > > > and iti_sub.iti_sentido_itinerario = 'I'
> > > > and iti_sub.iti_cod_itinerario_linha =
> > > > via_sub.iti_cod_itinerario_linha
> > > > and dvi_sub.vlo_cod_localizacao is not null
> > > > group by dvi_sub.via_cod_viagem,
> > > > iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
> > > > ;
> > > > Cancel request sent
> > > > ERROR: canceling statement due to user request
> > > > Time: 10164.015 ms
> > > > citgis=# select now();
> > > > now
> > > > -------------------------------
> > > > 2015-06-05 09:27:22.006072-03
> > > > (1 row)
> > > >
> > > > Time: 0.152 ms
> > > > citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as
> > > dvi_cod_dados_via_iti,
> > > > dvi_sub.via_cod_viagem
> > > > from gis_dvi_dados_viagem_itinerario dvi_sub,
> gis_via_viagem
> > > > via_sub, gis_iti_itinerario iti_sub
> > > > where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
> > > > and via_sub.via_status_viagem = 'A'
> > > > and via_sub.via_dt_hora_ini > now() - interval '9
> hours'
> > > > and iti_sub.lin_cod_linha = 389
> > > > and iti_sub.iti_sentido_itinerario = 'I'
> > > > and iti_sub.iti_cod_itinerario_linha =
> > > > via_sub.iti_cod_itinerario_linha
> > > > and dvi_sub.vlo_cod_localizacao is not null
> > > > group by dvi_sub.via_cod_viagem,
> > > > iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
> > > > ;
> > > > dvi_cod_dados_via_iti | via_cod_viagem
> > > > -----------------------+----------------
> > > > 1059964443 | 7989813
> > > > 1060072723 | 7990876
> > > > (2 rows)
> > > >
> > > > Time: 5565.175 ms
> > > > citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as
> > > dvi_cod_dados_via_iti,
> > > > dvi_sub.via_cod_viagem
> > > > from gis_dvi_dados_viagem_itinerario dvi_sub,
> gis_via_viagem
> > > > via_sub, gis_iti_itinerario iti_sub
> > > > where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
> > > > and via_sub.via_status_viagem = 'A'
> > > > and via_sub.via_dt_hora_ini > now() - interval '9
> hours'
> > > > and iti_sub.lin_cod_linha = 389
> > > > and iti_sub.iti_sentido_itinerario = 'I'
> > > > and iti_sub.iti_cod_itinerario_linha =
> > > > via_sub.iti_cod_itinerario_linha
> > > > and dvi_sub.vlo_cod_localizacao is not null
> > > > group by dvi_sub.via_cod_viagem,
> > > > iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
> > > > ;
> > > > dvi_cod_dados_via_iti | via_cod_viagem
> > > > -----------------------+----------------
> > > > 1059964445 | 7989813
> > > > 1060072725 | 7990876
> > > > (2 rows)
> > > >
> > > > Time: 27.944 ms
> > > > citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as
> > > dvi_cod_dados_via_iti,
> > > > dvi_sub.via_cod_viagem
> > > > from gis_dvi_dados_viagem_itinerario dvi_sub,
> gis_via_viagem
> > > > via_sub, gis_iti_itinerario iti_sub
> > > > where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
> > > > and via_sub.via_status_viagem = 'A'
> > > > and via_sub.via_dt_hora_ini > now() - interval '9
> hours'
> > > > and iti_sub.lin_cod_linha = 389
> > > > and iti_sub.iti_sentido_itinerario = 'I'
> > > > and iti_sub.iti_cod_itinerario_linha =
> > > > via_sub.iti_cod_itinerario_linha
> > > > and dvi_sub.vlo_cod_localizacao is not null
> > > > group by dvi_sub.via_cod_viagem,
> > > > iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
> > > > ;
> > > > dvi_cod_dados_via_iti | via_cod_viagem
> > > > -----------------------+----------------
> > > > 1059964445 | 7989813
> > > > 1060072727 | 7990876
> > > > (2 rows)
> > > >
> > > > Time: 24.428 ms
> > > >
> > > > Greetings
> > > >
> > > > Anderson
> > >
> > >
> > > --
> > > Bill Moran <wmoran(at)potentialtech(dot)com>
> > >
>
>
> --
> Bill Moran <wmoran(at)potentialtech(dot)com>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Bill Moran 2015-06-15 21:19:24 Re: Momentary Delay
Previous Message Robert Haas 2015-06-15 20:38:28 Re: [HACKERS] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1