Re: VACUUM ANALYZE Issues

From: Harold Falkmeyer <hfalkmeyer(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: VACUUM ANALYZE Issues
Date: 2017-06-06 02:18:34
Message-ID: CACcYriuL4QUqYYFtLAdQB4CEb4_rZveHe3SskzHXiSpA=gxKCw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Andres,

Thank you for your reply. Please find my comments inline below.

On Mon, Jun 5, 2017 at 3:47 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:

> Hi,
>
> On 2017-06-05 14:20:12 -0700, Harold Falkmeyer wrote:
> > My firm manages a somewhat large PostgreSQL database (400'ish tables, 5
> > TiB'ish on disk, 200'ish GiB pg_dumps, etc.). We're presently running
> > PostgreSQL 8.4.22 on Linux 2.6.32 with Slony-I 2.0.8 for replication.
>
> First off, you should seriously consider upgrading. The situation has
> gotten considerably better since 8.4, particularly in 9.6. Also the 8.4
> branch hasn't received bugfixed (including security fixes) for nearly
> three years.
>

Agreed. We plan on upgrading to at least 9.4.9, if not 9.6.x within the
next several months; however, we're not quite there yet.

> Over the past several weeks, our weekly VACUUM ANALYZE job has become
> > disruptive to other queries, seemingly spinning on one particular table
> > (one of our largest in rows, columns, indexes, foreign keys, and general
> > use), ultimately requiring that the job be terminated. With the number
> of
> > INSERTs, UPDATEs, and DELETEs going through our system, it's imperative
> > that VACUUM ANALYZEs be completed periodically. In the past, and given
> our
> > usage patterns, we've opted for weekly VACUUM ANALYZEs in lieu of
> > autovacuum, which is presently disabled.
>
> That sounds like it might seriously exascerbate the problem - weekly is
> really infrequent for both vacuum and analyze.
>

Agreed. The intention for the new setup is to enable autovacuum.

> After about 60-90 minutes working on the aforementioned table, and with
> the
> > VACUUM ANALYZE at the top of our longest running queries list, other
> > operations attempting to read from or write to on that table either
> > complete extremely slowly or are awaiting locks held by the VACUUM
> > ANALYZE. Sampling during the most recent disruptive period, the VACUUM
> > ANALYZE pid held 20 locks: 17 RowExclusiveLocks, one AccessExclusiveLock,
> > and one ShareUpdateExclusive, all on the aforementioned table, and one
> > ExclusiveLock for the containing transaction.
>
> An ExclusiveLock on the transaction is completely normal, and only used
> for other sessions to be able to wait for the completion of the the
> holder's transaction.
>
> It does not entirely sound like you're suffering from locking problems.
> Are there any locks that are not granted (e.g. SELECT * FROM pg_locks
> WHERE NOT granted;) during that period? If so, what commands are
> using these locks, you should be able to do normal table manipulations
> while vacuum is in progress.
>

Many other processes were listed as having ungranted locks, including many
SELECT statements having ungranted AccessShareLocks for the affected
relation. In fact, the ONLY granted locks for the affected relation were
the AccessExclusiveLock and ShareUpdateExclusive locks held by the VACUUM
ANALYZE. Under what circumstances does VACUUM ANALYZE attempt to acquire
an AccessExclusiveLock?

> After sending SIGTERMs to
> > all other pids enumerated in pg_locks as holding or wanting locks on that
> > table, the VACUUM ANALYZE continued without apparent progress. At the
> > time, the containing process was consuming a rough average of about 6% of
> > one CPU and 4.9% memory (no swapping), was holding 641 descriptors, was
> in
> > a seeming loop of lseek, read, memcpy, and memcmp, with occasional semop
> > calls. During this process, we attempted to SIGTERM all other
> connections
> > to ensure no stale transactions, locks, etc. were preventing the VACUUM
> > ANALYZE from finishing. Overall system load and I/O activity was quite
> low.
>
> > Ultimately, we've felt forced to terminate the VACUUM ANALYZEs. After
> > sending it a SIGTERM and after about 30 seconds of seeing no change, we
> > attempted a service postgresql-8.4 stop. After 30 additional seconds, we
> > sent a SIGQUIT to the VACUUM ANALYZE process (which we always try to
> > avoid), which terminated it and, in turn, PostgreSQL stopped.
> Thereafter,
> > the restarts seemed quick and uneventful, with cleanup happening without
> > reported exception. Only once in the last month or so have we seen a
> > VACUUM ANALYZE on this table complete.
>
> Well, that'll make the problem worse and worse. I suspect your setting
> might not be appropriately aggressive to be able to collect the
> accumulated cruft. What's your maintenance_work_mem setting, and what
> are your vacuum_cost_delay/vacuum_cost_limit settings?
>

In what way do the SIGQUITs aggravate problems? I was under the impression
that any residual artifacts left behind by a SIGQUIT (or immediate
shutdown) are resolved when the database is next started!?

Our maintenance_work_mem is 16 GiB, our vacuum_cost_delay is 0 ms, and our
vacuum_cost_limit is 200. Host memory is 126 GiB. Perhaps a significant
increase to maintenance_work_mem is warranted (and safe, given that we
don't presently have the risk of multiple autovacuum workers). Do you have
general rule-of-thumb recommendations for these settings (e.g., relative to
other settings)?

Are there other sources I should be querying for useful information? For
example, I understand that pg_locks doesn't show all PostgreSQL locks that
could cause interference (e.g., as mentioned in
http://rhaas.blogspot.com/2011/03/troubleshooting-stuck-vacuums.html)!?

Greetings,
>
> Andres Freund
>

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Thorsten Schöning 2017-06-06 08:04:54 Does Postgres ever write to tables without file system timestamps getting updated?
Previous Message Sergey Burladyan 2017-06-05 22:53:29 Questions about upgrade standby with rsync