Re: BUG #11444: autovacuum stuck for 5 days and waits on a lock

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: alexk(at)hintbits(dot)com
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #11444: autovacuum stuck for 5 days and waits on a lock
Date: 2014-09-17 16:12:27
Message-ID: 20140917161227.GS25887@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

On 2014-09-17 16:02:16 +0000, alexk(at)hintbits(dot)com wrote:
> We've got a problem with autovacuum process running for already 5 days,
> without being blocked by any particular process.
> The table in question is around 500MB. Usually we don't see vacuums running
> there for more than couple of hours.
> Other autovacuum processes (on different tables) start and finish without
> issues.
>
> The vacuum and autovacuum settings are left at default values. We don't have
> debug information compiled on this server, but here's what we have after the
> gdb bt output, attaching to the autovacuum process.

> #0 0x00007f423cbf34f7 in semop () from /lib/x86_64-linux-gnu/libc.so.6
> #1 0x00007f423e8581f8 in PGSemaphoreLock ()
> #2 0x00007f423e889a45 in LockBufferForCleanup ()
> #3 0x00007f423e6d9cd6 in ?? ()
> #4 0x00007f423e6da08f in ?? ()
> #5 0x00007f423e6daaf6 in btbulkdelete ()

What is happening here is that vacuum (of a btree index) is waiting for a so called
'cleanup' lock on a page. That is it waits for the page not being pinned
by any other backend.

> semop(2523174, {{12, -1, 0}}, 1
> semop(2523174, {{12, -1, 0}}, 1
>
>
> ) = 0
> lseek(21, 0, SEEK_END) = 32276480
> select(0, NULL, NULL, NULL, {0, 21000}) = 0 (Timeout)
> semop(2523174, {{12, -1, 0}}, 1
>
> Most of the time it is stuck with semop.

That indicates it's waiting most of the time. But it does make progress.

> There are also UPDATE statements constantly running on this table, in the
> overlapping manner, so at a single moment there is at least one update
> running on it. We are investigating why is it done this way, but can it be a
> reason behind this strange vacuum behavior?

Which quite possibly is caused by this.

I've recently commented on -hackers that this is a very hard to debug
behaviour and should be made more visible, but IIRC we didn't come to a
conclusion...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2014-09-17 16:16:51 Re: pg_dump -Fd fails to detect ENOSPC
Previous Message alexk 2014-09-17 16:02:16 BUG #11444: autovacuum stuck for 5 days and waits on a lock