VACUUM ANALYZE Issues

From: Harold Falkmeyer <hfalkmeyer(at)gmail(dot)com>
To: pgsql-admin(at)postgresql(dot)org
Subject: VACUUM ANALYZE Issues
Date: 2017-06-05 21:20:12
Message-ID: CACcYriu+wuhS7rDNgBRaySb7J59i3Y_4aqxLP+-L+xst4=4Syw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Greetings,

My firm manages a somewhat large PostgreSQL database (400'ish tables, 5
TiB'ish on disk, 200'ish GiB pg_dumps, etc.). We're presently running
PostgreSQL 8.4.22 on Linux 2.6.32 with Slony-I 2.0.8 for replication.

Over the past several weeks, our weekly VACUUM ANALYZE job has become
disruptive to other queries, seemingly spinning on one particular table
(one of our largest in rows, columns, indexes, foreign keys, and general
use), ultimately requiring that the job be terminated. With the number of
INSERTs, UPDATEs, and DELETEs going through our system, it's imperative
that VACUUM ANALYZEs be completed periodically. In the past, and given our
usage patterns, we've opted for weekly VACUUM ANALYZEs in lieu of
autovacuum, which is presently disabled. If necessary, we can schedule
downtime, dereplicate the affected table, entirely rebuild the affected
table, and rereplicate. Though, if possible, we're looking for a solution
without significant downtime.

After about 60-90 minutes working on the aforementioned table, and with the
VACUUM ANALYZE at the top of our longest running queries list, other
operations attempting to read from or write to on that table either
complete extremely slowly or are awaiting locks held by the VACUUM
ANALYZE. Sampling during the most recent disruptive period, the VACUUM
ANALYZE pid held 20 locks: 17 RowExclusiveLocks, one AccessExclusiveLock,
and one ShareUpdateExclusive, all on the aforementioned table, and one
ExclusiveLock for the containing transaction. After sending SIGTERMs to
all other pids enumerated in pg_locks as holding or wanting locks on that
table, the VACUUM ANALYZE continued without apparent progress. At the
time, the containing process was consuming a rough average of about 6% of
one CPU and 4.9% memory (no swapping), was holding 641 descriptors, was in
a seeming loop of lseek, read, memcpy, and memcmp, with occasional semop
calls. During this process, we attempted to SIGTERM all other connections
to ensure no stale transactions, locks, etc. were preventing the VACUUM
ANALYZE from finishing. Overall system load and I/O activity was quite low.

Ultimately, we've felt forced to terminate the VACUUM ANALYZEs. After
sending it a SIGTERM and after about 30 seconds of seeing no change, we
attempted a service postgresql-8.4 stop. After 30 additional seconds, we
sent a SIGQUIT to the VACUUM ANALYZE process (which we always try to
avoid), which terminated it and, in turn, PostgreSQL stopped. Thereafter,
the restarts seemed quick and uneventful, with cleanup happening without
reported exception. Only once in the last month or so have we seen a
VACUUM ANALYZE on this table complete.

Thank you in advance,

Harold Falkmeyer
hfalkmeyer(at)yahoo(dot)com

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Andres Freund 2017-06-05 22:47:25 Re: VACUUM ANALYZE Issues
Previous Message Steven Chang 2017-06-04 09:19:44 Re: Re: listen_addresses = '*' postgresql 9.5 does not allow local connect