Re: How to analyze a slowdown in 9.3.5?

From: Michael Nolan <htfoot(at)gmail(dot)com>
To: Andy Colson <andy(at)squeakycode(dot)net>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: How to analyze a slowdown in 9.3.5?
Date: 2015-01-10 21:00:01
Message-ID: CAOzAqu+5vnBo3g1dNy3Vnix-R+uuVjXNV4SBOaKzFPvwfzEhwA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

The function is a complicated plpgsql function that makes numerous database
queries, all read-only. (Other parts of that program may make changes to
the database.)

The first database shutdown and the shutdown/reboot later on were both
'clean' shutdowns, so there shouldn't have been any kind of transaction
rollback.

I has sar running on that server, if that provides any useful data. Mostly
I'm just trying to make up a list of what to look for and what to log in
case it happens again. (It runs again on Tuesday, and I already know it
will be going back to review 2004 data so it'll be an even longer run than
this week's was.)
--
Mike Nolan

On Sat, Jan 10, 2015 at 12:55 PM, Andy Colson <andy(at)squeakycode(dot)net> wrote:

> On 01/09/2015 07:52 PM, Tomas Vondra wrote:
>
>> On 9.1.2015 23:14, Michael Nolan wrote:
>>
>>> I'm running 9.3.5 on a virtual machine with 5 cores and 24 GB of
>>> memory. Disk is on a SAN.
>>>
>>> I have a task that runs weekly that processes possibly as many as
>>> 120 months worth of data, one month at a time. Since moving to 9.3.5
>>> (from 8.2!!) the average time for a month has been 3 minutes or less.
>>>
>>
>> Congrats to migrating to a supported version!
>>
>> Please, comparison of the configuration used on 8.2 and 9.3.5 would be
>> helpful (i.e. how you've updated the config on the new version?).
>>
>> However, when this job ran this Tuesday, it ran fine for a number of
>>> months, but then started slowing down dramatically, 300 minutes for
>>> one month and then 167 minutes for the next. I stopped and restarted
>>> postgresql, the next block also ran really slow (157 minutes.) I
>>> then rebooted the server and the remaining blocks ran at the usual
>>> fast speed again, so restarting postgresql didn't fix the problem
>>> but rebooting the server did.
>>>
>>
>> What amounts of data are we talking about? Gigabytes? Tens of gigabytes?
>>
>>
>>> Looking at the logs, I see queries with a function call that would
>>> normally take no more than 100-200 milliseconds, usually far less,
>>> that were taking 100 seconds or longer. This function gets called
>>> thousands of times for each month, so that appears to be one source
>>> of the slowdown.
>>>
>>
>> But why are the functions taking so much longer? Are they eating CPU,
>> I/O or are generally waiting for something (e.g. locks)?
>>
>>
>>> I don't suspect a memory leak in the calling program (in php),
>>> because since moving to this server in December this weekly task has
>>> run several times over the same range of months, making pretty much
>>> the same function calls each time. I also ran the entire range
>>> several times during testing.
>>>
>>> One change made to the server since the previous week's run was that
>>> I moved up to the latest Centos kernel (Linux version
>>> 3.10.0-123.13.2.el7.x86_64).
>>>
>>
>> And what was the previous kernel version?
>>
>> However, if it worked fine after rebooting the server, it may not be a
>> kernel issue (unless it somehow depends on uptime). Is there something
>> in the /var/log/messages?
>>
>>
> At first, I was thinking, lots of activity within one transaction was
> messing up the stats and the planner started getting it wrong. But a
> reboot wouldn't fix that. Would it? What if the reboot rolled back the
> db, would that stats make sense again?
>
> I have a process that makes a big temp table (with indexes). After its
> built if I dont run a quick analyze on it the planner never uses the
> indexes right.
>
> Another thing I can think of is never commiting. If it started collecting
> lots and lots of row versions it could get slower and slower. But, then,
> you'd see the same thing on 8.2, so, that's probably not it. Do you have
> any "Idle in transaction" connections?
>
> -Andy
>
>
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Michael Nolan 2015-01-10 21:11:40 Re: How to analyze a slowdown in 9.3.5?
Previous Message Andres Freund 2015-01-10 20:39:22 Re: unexpected PQresultStatus: 8 with simple logical replication