Re: autovacuum stuck on a table for 18+ hours, consuming lots of CPU time

From: Lonni J Friedman <netllama(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: autovacuum stuck on a table for 18+ hours, consuming lots of CPU time
Date: 2011-11-23 03:43:58
Message-ID: CAP=oouF7P_zThUrcQPxVNYrhLByZ0T-TVdQa5iUKXN3BGyegoQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, Nov 22, 2011 at 7:19 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Lonni J Friedman <netllama(at)gmail(dot)com> writes:
>> Thanks for your prompt reply.  I was pretty sure that I was using the
>> default, but just to confirm, I just ran:
>> 'SHOW vacuum_cost_delay;'
>
> What about autovacuum_vacuum_cost_delay?  The selects seem to be
> delaying for 32msec, which is not the default for anything.

20ms is what it returns. I grepped for '32' in postgresql.conf and
other than the '5432' port, nothing was returned.

I'm using the official postgres RPMs that shipped with
Fedora15-x86_64, so unless something weird got compiled in, I have no
idea where that 32ms is coming from.

>
>> Is there no way to safely kill off this autovacuum process that
>> doesn't involve shutting down & restarting the database?
>
> Sending it a SIGINT ought to be safe enough, though I don't think that
> is necessarily advisable, because the next autovacuum will probably take
> just as long.  Killing this one will mainly mean you've wasted (much of)
> the work it did so far.

ok, then I guess I'll wait longer.

>
> Before getting hasty I'd suggest identifying what table (or index) it's
> working on --- lsof on the process to see what FD 95 is connected to
> would be the first step.  I'm thinking it's an index since the seek
> addresses don't seem to be consecutive.  And it might be worth watching
> the seek addresses for awhile to see if you can prove that it's looping
> --- if it is, that might be an indication of a corrupt index.  If it
> isn't, but is just steadily working through the index, you'd be best
> advised to have patience.

I suspect you're right. I just ran strace against that PID again, and
now all the lseek & read FD's are referrring to a different number
(115), so that means its moved onto something new since I looked a few
hours ago?

Anyway, I think this is what you were referring to:
/proc/30188/fd/115 -> /var/lib/pgsql/data/base/64793/72633.10

How do I correlate that file to an actual database object?

thanks

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2011-11-23 03:49:56 Re: autovacuum stuck on a table for 18+ hours, consuming lots of CPU time
Previous Message Ondrej Ivanič 2011-11-23 03:22:41 Re: autovacuum stuck on a table for 18+ hours, consuming lots of CPU time