Re: VACUUM ANALYZE Issues

From: Harold Falkmeyer <hfalkmeyer(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: VACUUM ANALYZE Issues
Date: 2017-06-09 04:52:38
Message-ID: CACcYrivHfizYrWup2MNUjBSddx2RYwN-1qeV6Xdgj0qD3i-MnQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

What would cause a VACUUM ANALYZE to acquire an ACCESS EXCLUSIVE lock (not
running as a VACUUM FULL)?

Harold Falkmeyer

On Tue, Jun 6, 2017 at 11:43 AM, Harold Falkmeyer <hfalkmeyer(at)gmail(dot)com>
wrote:

> I'd forgotten to mention that the VACUUM VERBOSE ANALYZEs were/are
> consistently completing their passes over indices. While the VACUUM
> ANALYZEs were disruptive, and looping as described above, the VERBOSE
> progress report was last indicating that "DETAIL: N dead row versions
> cannot be removed yet."
>
> Harold
>
> On Mon, Jun 5, 2017 at 7:18 PM, Harold Falkmeyer <hfalkmeyer(at)gmail(dot)com>
> wrote:
>
>> Andres,
>>
>> Thank you for your reply. Please find my comments inline below.
>>
>> On Mon, Jun 5, 2017 at 3:47 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>>
>>> Hi,
>>>
>>> On 2017-06-05 14:20:12 -0700, Harold Falkmeyer wrote:
>>> > My firm manages a somewhat large PostgreSQL database (400'ish tables, 5
>>> > TiB'ish on disk, 200'ish GiB pg_dumps, etc.). We're presently running
>>> > PostgreSQL 8.4.22 on Linux 2.6.32 with Slony-I 2.0.8 for replication.
>>>
>>> First off, you should seriously consider upgrading. The situation has
>>> gotten considerably better since 8.4, particularly in 9.6. Also the 8.4
>>> branch hasn't received bugfixed (including security fixes) for nearly
>>> three years.
>>>
>>
>> Agreed. We plan on upgrading to at least 9.4.9, if not 9.6.x within the
>> next several months; however, we're not quite there yet.
>>
>> > Over the past several weeks, our weekly VACUUM ANALYZE job has become
>>> > disruptive to other queries, seemingly spinning on one particular table
>>> > (one of our largest in rows, columns, indexes, foreign keys, and
>>> general
>>> > use), ultimately requiring that the job be terminated. With the
>>> number of
>>> > INSERTs, UPDATEs, and DELETEs going through our system, it's imperative
>>> > that VACUUM ANALYZEs be completed periodically. In the past, and
>>> given our
>>> > usage patterns, we've opted for weekly VACUUM ANALYZEs in lieu of
>>> > autovacuum, which is presently disabled.
>>>
>>> That sounds like it might seriously exascerbate the problem - weekly is
>>> really infrequent for both vacuum and analyze.
>>>
>>
>> Agreed. The intention for the new setup is to enable autovacuum.
>>
>> > After about 60-90 minutes working on the aforementioned table, and with
>>> the
>>> > VACUUM ANALYZE at the top of our longest running queries list, other
>>> > operations attempting to read from or write to on that table either
>>> > complete extremely slowly or are awaiting locks held by the VACUUM
>>> > ANALYZE. Sampling during the most recent disruptive period, the VACUUM
>>> > ANALYZE pid held 20 locks: 17 RowExclusiveLocks, one
>>> AccessExclusiveLock,
>>> > and one ShareUpdateExclusive, all on the aforementioned table, and one
>>> > ExclusiveLock for the containing transaction.
>>>
>>> An ExclusiveLock on the transaction is completely normal, and only used
>>> for other sessions to be able to wait for the completion of the the
>>> holder's transaction.
>>>
>>> It does not entirely sound like you're suffering from locking problems.
>>> Are there any locks that are not granted (e.g. SELECT * FROM pg_locks
>>> WHERE NOT granted;) during that period? If so, what commands are
>>> using these locks, you should be able to do normal table manipulations
>>> while vacuum is in progress.
>>>
>>
>> Many other processes were listed as having ungranted locks, including
>> many SELECT statements having ungranted AccessShareLocks for the affected
>> relation. In fact, the ONLY granted locks for the affected relation were
>> the AccessExclusiveLock and ShareUpdateExclusive locks held by the VACUUM
>> ANALYZE. Under what circumstances does VACUUM ANALYZE attempt to acquire
>> an AccessExclusiveLock?
>>
>> > After sending SIGTERMs to
>>> > all other pids enumerated in pg_locks as holding or wanting locks on
>>> that
>>> > table, the VACUUM ANALYZE continued without apparent progress. At the
>>> > time, the containing process was consuming a rough average of about 6%
>>> of
>>> > one CPU and 4.9% memory (no swapping), was holding 641 descriptors,
>>> was in
>>> > a seeming loop of lseek, read, memcpy, and memcmp, with occasional
>>> semop
>>> > calls. During this process, we attempted to SIGTERM all other
>>> connections
>>> > to ensure no stale transactions, locks, etc. were preventing the VACUUM
>>> > ANALYZE from finishing. Overall system load and I/O activity was
>>> quite low.
>>>
>>> > Ultimately, we've felt forced to terminate the VACUUM ANALYZEs. After
>>> > sending it a SIGTERM and after about 30 seconds of seeing no change, we
>>> > attempted a service postgresql-8.4 stop. After 30 additional seconds,
>>> we
>>> > sent a SIGQUIT to the VACUUM ANALYZE process (which we always try to
>>> > avoid), which terminated it and, in turn, PostgreSQL stopped.
>>> Thereafter,
>>> > the restarts seemed quick and uneventful, with cleanup happening
>>> without
>>> > reported exception. Only once in the last month or so have we seen a
>>> > VACUUM ANALYZE on this table complete.
>>>
>>> Well, that'll make the problem worse and worse. I suspect your setting
>>> might not be appropriately aggressive to be able to collect the
>>> accumulated cruft. What's your maintenance_work_mem setting, and what
>>> are your vacuum_cost_delay/vacuum_cost_limit settings?
>>>
>>
>> In what way do the SIGQUITs aggravate problems? I was under the
>> impression that any residual artifacts left behind by a SIGQUIT (or
>> immediate shutdown) are resolved when the database is next started!?
>>
>> Our maintenance_work_mem is 16 GiB, our vacuum_cost_delay is 0 ms, and
>> our vacuum_cost_limit is 200. Host memory is 126 GiB. Perhaps a
>> significant increase to maintenance_work_mem is warranted (and safe, given
>> that we don't presently have the risk of multiple autovacuum workers). Do
>> you have general rule-of-thumb recommendations for these settings (e.g.,
>> relative to other settings)?
>>
>> Are there other sources I should be querying for useful information? For
>> example, I understand that pg_locks doesn't show all PostgreSQL locks that
>> could cause interference (e.g., as mentioned in
>> http://rhaas.blogspot.com/2011/03/troubleshooting-stuck-vacuums.html)!?
>>
>> Greetings,
>>>
>>> Andres Freund
>>>
>>
>>
>

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Andres Freund 2017-06-09 05:25:10 Re: VACUUM ANALYZE Issues
Previous Message x_hsky 2017-06-08 01:51:50 Re: How to build a distributed pg-10.0 cluster