Re: Occasional spike in query response time (jumps from 200ms to 45 seconds)

From: Robert Voinea <rvoinea(at)gmail(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Occasional spike in query response time (jumps from 200ms to 45 seconds)
Date: 2013-11-08 15:43:22
Message-ID: 13209807.nWtyJFh2Fe@shu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Hi

On Thursday 07 November 2013 06:40:00 Kevin Grittner wrote:
> Robert Voinea <rvoinea(at)gmail(dot)com> wrote:
> > From time to time (1-2 months) one of the production systems I
> > manage starts acting crazy... and this is starting to become a
> > problem.
> >
> > Every query I send to the server has a very long running time
> > (sometimes it reaches 45+ seconds). Even the simples queries like
> > "SELECT NOW()" run in 4-5 seconds... or more.
> >
> > This goes on for a few minutes. After that, everything is back to
> > normal.
>
> A likely cause for this is transparent huge page defrag. If you
> run `vmstat 1` during an episode and see a lot of system CPU time,
> that tends to confirm this. If you run `perf top` during an
> episode and see the kernel spending a lot of time in spinlock
> functions, that pretty much nails it. You might want to turn off
> transparent huge page support.
I need to read about this first.

I forgot to mention that this is a 32 bit system... and due to the known
limitations we are forced to use PAE on the kernel...
Unfortunately we cannot use a 64 bit operating system.

> > I'm guessing that, somehow, there is heavy I/O usage at that time
>
> Which might be another possibility if the `vmstat 1` shows a lot of
> I/O wait time.
Another colleague of mine managed to somehow reproduce the issue.
It seems that the disk + network interface could be the bottle neck (iostat &
iotop shows average wait times very high) and the peak throughput of the disk
at 70MB/s.
This is not the actual production system, but a similar one (as hardware).
We'll try to re-do the tests with the clients and the server connected via a
Gigabit network and see if this helps.

> > I've seen that postgres 9.1.8 Changelog contains some bug fixes
> > related to performance of autovaccum:
> If there is high I/O and you see autovacuum running wraparound
> prevention vacuums or vacuuming tables which have recently had a
> large number of rows deleted, you might solve the problem by
> updating. I would strongly recommend updating anyway, because
> otherwise you are running with known bugs and security
> vulnerabilities, including one very serious one.
>
> Why knowingly put yourself through dealing with bugs that others
> have already encountered, reported, diagnosed, and fixed?
I agree with you on this, but the update requires downtime and has to be
scheduled in advance.

Thank you for replying.
I will investigate this even further and see what I may come up with.
I'll keep you informed with what I find.

--
Robert Voinea
Software Engineer
+4 0740 467 262

Don't take life too seriously. You'll never get out of it alive.
(Elbert Hubbard)

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Nagaraj Shindagi 2013-11-10 14:37:41 during the maintenance facing error
Previous Message bricklen 2013-11-08 13:33:44 Re: Occasional spike in query response time (jumps from 200ms to 45 seconds)