Quick Links

Re: autovacuum stuck on a table for 18+ hours, consuming lots of CPU time

From:	Lonni J Friedman <netllama(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject:	Re: autovacuum stuck on a table for 18+ hours, consuming lots of CPU time
Date:	2011-11-23 03:43:58
Message-ID:	CAP=oouF7P_zThUrcQPxVNYrhLByZ0T-TVdQa5iUKXN3BGyegoQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On Tue, Nov 22, 2011 at 7:19 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Lonni J Friedman <netllama(at)gmail(dot)com> writes:
>> Thanks for your prompt reply. I was pretty sure that I was using the
>> default, but just to confirm, I just ran:
>> 'SHOW vacuum_cost_delay;'
>
> What about autovacuum_vacuum_cost_delay? The selects seem to be
> delaying for 32msec, which is not the default for anything.

20ms is what it returns. I grepped for '32' in postgresql.conf and
other than the '5432' port, nothing was returned.

I'm using the official postgres RPMs that shipped with
Fedora15-x86_64, so unless something weird got compiled in, I have no
idea where that 32ms is coming from.

>
>> Is there no way to safely kill off this autovacuum process that
>> doesn't involve shutting down & restarting the database?
>
> Sending it a SIGINT ought to be safe enough, though I don't think that
> is necessarily advisable, because the next autovacuum will probably take
> just as long. Killing this one will mainly mean you've wasted (much of)
> the work it did so far.

ok, then I guess I'll wait longer.

>
> Before getting hasty I'd suggest identifying what table (or index) it's
> working on --- lsof on the process to see what FD 95 is connected to
> would be the first step. I'm thinking it's an index since the seek
> addresses don't seem to be consecutive. And it might be worth watching
> the seek addresses for awhile to see if you can prove that it's looping
> --- if it is, that might be an indication of a corrupt index. If it
> isn't, but is just steadily working through the index, you'd be best
> advised to have patience.

I suspect you're right. I just ran strace against that PID again, and
now all the lseek & read FD's are referrring to a different number
(115), so that means its moved onto something new since I looked a few
hours ago?

Anyway, I think this is what you were referring to:
/proc/30188/fd/115 -> /var/lib/pgsql/data/base/64793/72633.10

How do I correlate that file to an actual database object?

thanks

In response to

Re: autovacuum stuck on a table for 18+ hours, consuming lots of CPU time at 2011-11-23 03:19:21 from Tom Lane

Responses

Re: autovacuum stuck on a table for 18+ hours, consuming lots of CPU time at 2011-11-23 03:49:56 from Tom Lane

Browse pgsql-general by date

	From	Date	Subject
Next Message	Tom Lane	2011-11-23 03:49:56	Re: autovacuum stuck on a table for 18+ hours, consuming lots of CPU time
Previous Message	Ondrej Ivanič	2011-11-23 03:22:41	Re: autovacuum stuck on a table for 18+ hours, consuming lots of CPU time