Re: VACUUM ANALYZE Issues

From: Harold Falkmeyer <hfalkmeyer(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: VACUUM ANALYZE Issues
Date: 2017-06-06 18:43:22
Message-ID: CACcYriumnVyQp53MyRqwJ-ZaTme+5NtTbZXoS=2F_5mXXQqHKw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

I'd forgotten to mention that the VACUUM VERBOSE ANALYZEs were/are
consistently completing their passes over indices. While the VACUUM
ANALYZEs were disruptive, and looping as described above, the VERBOSE
progress report was last indicating that "DETAIL: N dead row versions
cannot be removed yet."

Harold

On Mon, Jun 5, 2017 at 7:18 PM, Harold Falkmeyer <hfalkmeyer(at)gmail(dot)com>
wrote:

> Andres,
>
> Thank you for your reply. Please find my comments inline below.
>
> On Mon, Jun 5, 2017 at 3:47 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>
>> Hi,
>>
>> On 2017-06-05 14:20:12 -0700, Harold Falkmeyer wrote:
>> > My firm manages a somewhat large PostgreSQL database (400'ish tables, 5
>> > TiB'ish on disk, 200'ish GiB pg_dumps, etc.). We're presently running
>> > PostgreSQL 8.4.22 on Linux 2.6.32 with Slony-I 2.0.8 for replication.
>>
>> First off, you should seriously consider upgrading. The situation has
>> gotten considerably better since 8.4, particularly in 9.6. Also the 8.4
>> branch hasn't received bugfixed (including security fixes) for nearly
>> three years.
>>
>
> Agreed. We plan on upgrading to at least 9.4.9, if not 9.6.x within the
> next several months; however, we're not quite there yet.
>
> > Over the past several weeks, our weekly VACUUM ANALYZE job has become
>> > disruptive to other queries, seemingly spinning on one particular table
>> > (one of our largest in rows, columns, indexes, foreign keys, and general
>> > use), ultimately requiring that the job be terminated. With the number
>> of
>> > INSERTs, UPDATEs, and DELETEs going through our system, it's imperative
>> > that VACUUM ANALYZEs be completed periodically. In the past, and given
>> our
>> > usage patterns, we've opted for weekly VACUUM ANALYZEs in lieu of
>> > autovacuum, which is presently disabled.
>>
>> That sounds like it might seriously exascerbate the problem - weekly is
>> really infrequent for both vacuum and analyze.
>>
>
> Agreed. The intention for the new setup is to enable autovacuum.
>
> > After about 60-90 minutes working on the aforementioned table, and with
>> the
>> > VACUUM ANALYZE at the top of our longest running queries list, other
>> > operations attempting to read from or write to on that table either
>> > complete extremely slowly or are awaiting locks held by the VACUUM
>> > ANALYZE. Sampling during the most recent disruptive period, the VACUUM
>> > ANALYZE pid held 20 locks: 17 RowExclusiveLocks, one
>> AccessExclusiveLock,
>> > and one ShareUpdateExclusive, all on the aforementioned table, and one
>> > ExclusiveLock for the containing transaction.
>>
>> An ExclusiveLock on the transaction is completely normal, and only used
>> for other sessions to be able to wait for the completion of the the
>> holder's transaction.
>>
>> It does not entirely sound like you're suffering from locking problems.
>> Are there any locks that are not granted (e.g. SELECT * FROM pg_locks
>> WHERE NOT granted;) during that period? If so, what commands are
>> using these locks, you should be able to do normal table manipulations
>> while vacuum is in progress.
>>
>
> Many other processes were listed as having ungranted locks, including many
> SELECT statements having ungranted AccessShareLocks for the affected
> relation. In fact, the ONLY granted locks for the affected relation were
> the AccessExclusiveLock and ShareUpdateExclusive locks held by the VACUUM
> ANALYZE. Under what circumstances does VACUUM ANALYZE attempt to acquire
> an AccessExclusiveLock?
>
> > After sending SIGTERMs to
>> > all other pids enumerated in pg_locks as holding or wanting locks on
>> that
>> > table, the VACUUM ANALYZE continued without apparent progress. At the
>> > time, the containing process was consuming a rough average of about 6%
>> of
>> > one CPU and 4.9% memory (no swapping), was holding 641 descriptors, was
>> in
>> > a seeming loop of lseek, read, memcpy, and memcmp, with occasional semop
>> > calls. During this process, we attempted to SIGTERM all other
>> connections
>> > to ensure no stale transactions, locks, etc. were preventing the VACUUM
>> > ANALYZE from finishing. Overall system load and I/O activity was quite
>> low.
>>
>> > Ultimately, we've felt forced to terminate the VACUUM ANALYZEs. After
>> > sending it a SIGTERM and after about 30 seconds of seeing no change, we
>> > attempted a service postgresql-8.4 stop. After 30 additional seconds,
>> we
>> > sent a SIGQUIT to the VACUUM ANALYZE process (which we always try to
>> > avoid), which terminated it and, in turn, PostgreSQL stopped.
>> Thereafter,
>> > the restarts seemed quick and uneventful, with cleanup happening without
>> > reported exception. Only once in the last month or so have we seen a
>> > VACUUM ANALYZE on this table complete.
>>
>> Well, that'll make the problem worse and worse. I suspect your setting
>> might not be appropriately aggressive to be able to collect the
>> accumulated cruft. What's your maintenance_work_mem setting, and what
>> are your vacuum_cost_delay/vacuum_cost_limit settings?
>>
>
> In what way do the SIGQUITs aggravate problems? I was under the
> impression that any residual artifacts left behind by a SIGQUIT (or
> immediate shutdown) are resolved when the database is next started!?
>
> Our maintenance_work_mem is 16 GiB, our vacuum_cost_delay is 0 ms, and our
> vacuum_cost_limit is 200. Host memory is 126 GiB. Perhaps a significant
> increase to maintenance_work_mem is warranted (and safe, given that we
> don't presently have the risk of multiple autovacuum workers). Do you have
> general rule-of-thumb recommendations for these settings (e.g., relative to
> other settings)?
>
> Are there other sources I should be querying for useful information? For
> example, I understand that pg_locks doesn't show all PostgreSQL locks that
> could cause interference (e.g., as mentioned in http://rhaas.blogspot.com/
> 2011/03/troubleshooting-stuck-vacuums.html)!?
>
> Greetings,
>>
>> Andres Freund
>>
>
>

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Harold Falkmeyer 2017-06-07 02:56:24 Immediate Shutdowns
Previous Message scott ribe 2017-06-06 16:23:02 Re: Does Postgres ever write to tables without file system timestamps getting updated?