PostgreSQL databases require periodic
maintenance known as vacuuming, and require
periodic updates to the statistics used by the
PostgreSQL query planner. The VACUUM
and ANALYZE
commands
perform these maintenance tasks. The autovacuum
daemon automatically schedules maintenance tasks based on
workload requirements.
The autovacuum daemon has to process each table regularly for several reasons:
VACUUM
operation avoid needlessly scanning already-frozen pages.pg_xact
for the
entire cluster.
The first four maintenance tasks are handled by running
VACUUM
from within an autovacuum worker process. The
fifth and final task (maintenance of planner statistics) is handled by
running ANALYZE
from within an autovacuum worker
process.
Generally speaking, database administrators new to tuning autovacuum should
start by considering the need to adjust autovacuum's scheduling.
Autovacuum scheduling is controlled via threshold settings. These settings
determine when autovacuum should launch a worker to run
VACUUM
and/or ANALYZE
; see the
previous section, Section 25.1. This section provides
additional information about the design and goals of autovacuum,
VACUUM
, and ANALYZE
. The intended
audience is database administrators that wish to perform more advanced
autovacuum tuning, with any of the following goals in mind:
Tuning VACUUM
to improve query response times.
Making sure that VACUUM
's management of the
transaction ID address space is functioning optimally.
Tuning VACUUM
for performance stability.
With larger installations, tuning autovacuum usually won't be a once-off task; it is best to approach tuning as an iterative, applied process.
Autovacuum might create a lot of I/O traffic at times, which can cause poor performance for other active sessions. There are configuration parameters you can adjust to reduce the impact on system response time. See the autovacuum-specific cost delay settings described in Section 20.10, and additional cost delay settings described in Section 20.4.4.
Database administrators might also find it useful to supplement the
daemon's activities with manually-managed VACUUM
commands. Scripting tools like cron and
Task Manager can help with this. It can be
useful to perform off-hours VACUUM
commands during
periods when the application experiences less demand (e.g., on weekends, or
in the middle of the night). This section applies equally to
manually-issued VACUUM
and ANALYZE
operations, except where otherwise noted.
You can monitor VACUUM
progress (whether run by
autovacuum or manually) via the
pg_stat_progress_vacuum
view. See
Section 28.5.5.
In PostgreSQL, an
UPDATE
or DELETE
of a row does not
immediately remove the old version of the row.
This approach is necessary to gain the benefits of multiversion
concurrency control (MVCC, see Chapter 13): the row version
must not be deleted while it is still potentially visible to other
transactions. A deleted row version (whether from an
UPDATE
or DELETE
) will usually cease
to be of interest to any still-running transaction shortly after the
original deleting transaction commits.
The space dead tuples occupy must eventually be reclaimed for reuse by new
rows, to avoid unbounded growth of disk space requirements. Reclaiming
space from dead rows is VACUUM
's main responsibility.
The transaction ID number
(XID) based cutoff point that
VACUUM
uses to determine if a deleted tuple is safe to
physically remove is reported under removable cutoff
in
the server log when autovacuum logging (controlled by log_autovacuum_min_duration) reports on a
VACUUM
operation executed by autovacuum. Tuples that
are not yet safe to remove are counted as dead but not yet
removable
tuples in the log report. VACUUM
establishes its removable cutoff
once, at the start of
the operation. Any older MVCC snapshot (or transaction
that allocates an XID) that's still running when the cutoff is established
may hold it back.
It's critical that no long-running transactions are allowed to hold back
every VACUUM
operation's cutoff for an extended
period. It may be a good idea to add monitoring to alert you about this.
VACUUM
can remove tuples inserted by aborted
transactions immediately
VACUUM
usually doesn't return space to the operating
system. There is one exception: space is returned to the OS whenever a
group of contiguous pages appears at the end of a table.
VACUUM
must acquire an ACCESS
EXCLUSIVE
lock to perform relation truncation. You can disable
relation truncation by setting the table's
vacuum_truncate
storage parameter to
off
.
If you have a table whose entire contents are deleted periodically,
consider using TRUNCATE
rather than
DELETE
. TRUNCATE
removes the entire
table's contents immediately, obviating the need for
VACUUM
. One disadvantage is that strict
MVCC semantics are violated.
VACUUM FULL
(or CLUSTER
) can be
useful when dealing with extreme amounts of dead tuples. It can reclaim
more disk space, but it is much slower, and usually more disruptive.
VACUUM FULL
rewrites an entire new copy of the table
and rebuilds all of the table's indexes. This makes it suitable for
highly fragmented tables, and tables where significant amounts of space
can be reclaimed.
Although VACUUM FULL
is technically an option of the
VACUUM
command, VACUUM FULL
uses a
completely different implementation. VACUUM FULL
is
essentially a variant of CLUSTER
. (The name
VACUUM FULL
is historical; the original implementation
was closer to standard VACUUM
.)
TRUNCATE
, VACUUM FULL
, and
CLUSTER
all require an ACCESS
EXCLUSIVE
lock, which can be highly disruptive
(SELECT
, INSERT
,
UPDATE
, and DELETE
commands can't
run at the same time).
VACUUM FULL
and CLUSTER
temporarily
use extra disk space. The extra space required is approximately equal to
the size of the table, since the old copies of the table and indexes
can't be released until the new ones are complete.
VACUUM
often marks some of the pages that it scans
frozen, indicating that all eligible rows on the page
were inserted by a transaction that committed sufficiently far in the past
that the effects of the inserting transaction are certain to be visible to
all current and future transactions. The specific transaction ID number
(XID) stored in a frozen heap row's
xmin
field is no longer needed to determine its
visibility. Furthermore, when a row undergoing freezing has an XID set in
its xmax
field (e.g., an XID left behind by an
earlier SELECT FOR UPDATE
row locker), the
xmax
field's XID is usually also removed.
Once frozen, heap pages are “self-contained”. Every query
can read all of the page's rows in a way that assumes that the inserting
transaction committed and is visible to its MVCC
snapshot. No query will ever have to consult external transaction status
metadata to interpret the page's contents, either. In particular,
pg_xact
transaction XID commit/abort status lookups
won't occur during query execution.
Freezing is a WAL-logged operation, so when
VACUUM
freezes a heap page, any copy of the page
located on a physical replication standby server will itself be
“frozen” shortly thereafter (when the relevant
FREEZE_PAGE
WAL record is replayed
on the standby). Queries that run on physical replication standbys avoid
pg_xact
lookups when reading from frozen pages, just
like queries that run on the primary server
[15].
VACUUM
generally postpones some freezing work as an
optimization, but VACUUM
cannot delay freezing forever.
Since on-disk storage of transaction IDs in heap row headers uses a
truncated partial 32-bit representation to save space (rather than the
full 64-bit representation used in other contexts), it plays a crucial
role in enabling management of the XID
address space by VACUUM
. If, for whatever
reason, VACUUM
is unable to freeze older XIDs on behalf
of an application that continues to require XID allocations, the system
will eventually refuse to allocate
transaction IDs due to transaction ID exhaustion (though this is
unlikely to occur unless autovacuum is configured incorrectly).
vacuum_freeze_min_age controls when freezing takes
place. When VACUUM
scans a heap page containing even
one XID that has already attained an age exceeding this value, the page is
frozen.
Multixact IDs support row locking by multiple
transactions. Since there is only limited space in a heap tuple header to store lock
information, that information is encoded as a “multiple transaction
ID”, or Multixact ID for short, whenever there is more than one
transaction concurrently locking a row. Information about which
transaction IDs are included in any particular Multixact ID is stored
separately in pg_multixact
. Only the Multixact ID
itself (a 32-bit integer) appears in the tuple's
xmax
field. This creates a dependency on
external Multixact ID transaction status information. This is similar to
the dependency ordinary unfrozen XIDs have on commit status information
from pg_xact
. VACUUM
must
therefore occasionally remove Multixact IDs from tuples during freezing.
vacuum_multixact_freeze_min_age also controls when
freezing takes place. It is analogous to
vacuum_freeze_min_age
, but “age” is
expressed in Multixact ID units. Lowering
vacuum_multixact_freeze_min_age
forces VACUUM
to process
xmax
fields with a Multixact ID in cases where
it would otherwise postpone the work of processing
xmax
until the next VACUUM
[16]. The setting generally doesn't significantly influence the
total number of pages VACUUM
freezes, even in tables
containing many Multixact IDs. This is because VACUUM
generally prefers proactive processing for most individual
xmax
fields that contain a Multixact ID (eager
proactive processing is typically cheaper).
Managing the added WAL volume from freezing over time
is a vital consideration for VACUUM
. It is why
VACUUM
doesn't just freeze every eligible tuple at the
earliest opportunity: the WAL written to freeze a
page's tuples is wasted in cases where the resulting frozen tuples are
soon deleted or updated anyway. It's also why VACUUM
will freeze all eligible tuples from a heap page once
the decision to freeze at least one tuple is taken: at that point, the
added cost of freezing all eligible tuples eagerly (measured in
“extra bytes of WAL written”) is far lower
than the probable cost of deferring freezing until a future
VACUUM
operation against the same table. Furthermore,
once the page is frozen, it can generally be marked as all-frozen within the
visibility map immediately afterwards.
In PostgreSQL versions before 16,
VACUUM
triggered freezing at the level of individual
xmin
and xmax
fields. Freezing only affected the exact XIDs that had already attained
an age of vacuum_freeze_min_age
or greater.
VACUUM
also triggers the freezing of a page in cases
where it already proved necessary to write out a full page image
(FPI) as part of a WAL record
describing how dead tuples were removed [17] (see Section 30.1 for background
information about how FPIs provide torn page
protection). This “freeze on an FPI
write” batching mechanism avoids an expected additional
FPI for the same page later on (this is the probable
outcome of lazily deferring freezing until vacuum_freeze_min_age
forces it). In effect, VACUUM
generates slightly more
WAL in the short term with the aim of ultimately
needing to generate much less WAL in the long term.
For tables that receive INSERT
operations, but few or
no UPDATE
/DELETE
operations, it
might be beneficial to lower autovacuum_freeze_min_age
for the table. This makes VACUUM
freeze the table's
pages “eagerly” during earlier autovacuums triggered by
autovacuum_vacuum_insert_scale_factor, which
improves performance stability for some workloads.
VACUUM
#
As noted already, freezing doesn't just allow queries to avoid lookups of
subsidiary transaction status information in structures such as
pg_xact
. Freezing also plays a crucial role in
enabling transaction ID address space management by
VACUUM
(and autovacuum). VACUUM
maintains information about the oldest unfrozen XID that remains in the
table when it uses its aggressive strategy.
Aggressive VACUUM
updates the table's pg_class
.relfrozenxid
to whatever XID was the oldest observed XID that
VACUUM
didn't freeze still
remaining at the end of processing. The table's
relfrozenxid
“advances” by a
certain number of XIDs (relative to the previous value set during the last
aggressive VACUUM
) as progress on freezing the oldest
pages in the table permits. Aggressive VACUUM
will
occasionally need to advance the whole database's pg_database
.datfrozenxid
afterwards, too — this is the minimum of the per-table
relfrozenxid
values (i.e., the earliest
relfrozenxid
) within the database.
Aggressive VACUUM
may need to perform significant
amounts of “catch-up” freezing missed by earlier
non-aggressive VACUUM
s, because non-aggressive
VACUUM
sometimes allows unfrozen pages to build up.
Over time, aggressive autovacuuming has two beneficial effects on the system as a whole:
The maximum XID age that the system can tolerate (i.e., the maximum
“distance” between the oldest unfrozen transaction ID in any
table in the database cluster and the next unallocated transaction ID) is
about 2.1 billion transaction IDs. This “maximum XID age”
invariant makes it fundamentally impossible to postpone aggressive
VACUUM
s (and freezing) forever. While there is no
simple formula for determining an oldest XID “age” for
database administrators to target, the invariant imposes a 2.1 billion
XID age hard limit — so there is a clear point
at which unfrozen XIDs should always be considered
“too old”, regardless of individual application requirements
or workload characteristics. If the hard limit is reached, the system
experiences transaction ID
exhaustion, which temporarily prevents the allocation of new
permanent transaction IDs. The system will only regain the ability to
allocate new transaction IDs when VACUUM
succeeds in
advancing the oldest datfrozenxid
in the cluster
(following an aggressive VACUUM
that runs to
completion against the table with the oldest
relfrozenxid
).
Aggressive VACUUM
also maintains the pg_class
.relminmxid
and
pg_database
.datminmxid
fields. These are needed to track the oldest Multixact ID in the table
and database, respectively. There are analogous rules, driven by
analogous considerations about managing the Multixact ID space. This
doesn't usually affect aggressive vacuuming requirements to a noticeable
degree, but can in databases that consume more Multixact IDs than
transaction IDs.
VACUUM
may not always freeze tuple
xmin
XIDs that have reached
vacuum_freeze_min_age
in age. The basic eligibility
criteria for freezing is the same as the criteria that determines if a
deleted tuple is safe for VACUUM
to remove: the
XID-based removable cutoff
(this is one of the
details that appears in the server log's reports on autovacuum
[18]).
In extreme cases, a long-running transaction can hold back every
VACUUM
's removable cutoff
for so
long that the system experiences
transaction ID exhaustion.
See Section 28.3 for details on how to monitor
relfrozenxid
and
relminmxid
age to avoid
transaction ID/Multixact ID exhaustion.
These issues can be debugged by following autovacuum log reports from
the server log over time: the log reports will include information about
the age of each VACUUM
's removable
cutoff
at the point the VACUUM
ended.
It may be useful to correlate the use of a cutoff with an excessively
high age with application-level problems such as long-running
transactions.
The 2.1 billion XIDs “maximum XID age” invariant must be preserved because transaction IDs stored in heap tuple headers use a truncated 32-bit representation (rather than the full 64-bit representation used in other contexts). Since all unfrozen transaction IDs from heap tuple headers must be from the same transaction ID epoch (or from a space in the 64-bit representation that spans two adjoining transaction ID epochs), there isn't any need to include a separate epoch field in each tuple header (see Section 74.1.2.1 for further details). This scheme requires much less on-disk storage space than a design that stores full 64-bit XIDs (consisting of a 32-bit epoch and a 32-bit partial XID) in heap tuple headers. On the other hand, it constrains the system's ability to allocate new XIDs in the worst case scenario where transaction ID exhaustion occurs.
There is only one major behavioral difference
between aggressive VACUUM
and non-aggressive
VACUUM
: non-aggressive VACUUM
skips
pages marked as all-visible using the visibility map, whereas aggressive
VACUUM
only skips the subset of pages that are both
all-visible and all-frozen. In other words, pages
that are just all-visible at the beginning of an
aggressive VACUUM
must be scanned, not skipped.
Scanning existing all-visible pages is necessary to determine the oldest
unfrozen XID that will remain in the table at the end of an aggressive
VACUUM
.
In practice, most tables require periodic aggressive vacuuming.
However, some individual non-aggressive VACUUM
operations can advance the table's
relfrozenxid
and/or
relminmxid
.
This happens whenever a non-aggressive VACUUM
notices
that it is safe without incurring any added cost from scanning
“extra” pages. It is most common in small, frequently
modified tables.
Non-aggressive VACUUM
s can sometimes overlook older
XIDs from existing all-visible pages (due to their policy of always
skipping all-visible pages). Over time, this can even lead to a
significant build-up of unfrozen pages in one table (accumulated
all-visible pages that remain unfrozen). When that happens, it is
inevitable that an aggressive VACUUM
will eventually
need to perform “catch-up” freezing that clears the table's
backlog of unfrozen pages.
There is also one minor behavioral difference
between aggressive VACUUM
and non-aggressive
VACUUM
: only aggressive VACUUM
is
required to sometimes wait for a page-level cleanup lock when a page is
scanned and observed to contain transaction IDs/Multixact IDs that
must be frozen. This difference exists because
aggressive VACUUM
is strictly required to advance
relfrozenxid
and/or
relminmxid
to
sufficiently recent values
[19]. The behavior can lead to occasional waits for a conflicting
buffer pin to be released by another backend. These waits are
imperceptible and harmless most of the time. In extreme cases there can
be extended waits, which can be observed under the
BufferPin
wait event in the
pg_stat_activity
view. See Table 28.4.
“Catch-up” freezing is not caused by any difference in how
vacuum_freeze_min_age
is applied by each type of
VACUUM
. It is an indirect result of
vacuum_freeze_min_age
only being applied to those
pages that VACUUM
scans (and cleanup locks) in the
first place. Therefore, it can be difficult to tune
vacuum_freeze_min_age
, especially for tables that
receive frequent non-aggressive VACUUM
s and
infrequent aggressive VACUUM
s.
Autovacuum server log reports [18] show how many transaction IDs
relfrozenxid
advanced by (if at all), and how
many Multixact IDs relminmxid
advanced by (if
at all).
The number of pages frozen, and the number of pages scanned (i.e., the
number of pages processed because they could not
skipped using the visibility map) are also shown. This can provide
useful guidance when tuning freezing-related settings, particularly
vacuum_freeze_table_age
and
vacuum_freeze_min_age
.
vacuum_freeze_table_age controls when
VACUUM
uses its aggressive strategy. If
age(relfrozenxid)
exceeds
vacuum_freeze_table_age
at the start of
VACUUM
, VACUUM
will employ its
aggressive strategy; otherwise, its standard non-aggressive strategy is
employed. Setting vacuum_freeze_table_age
to 0 forces
VACUUM
to always use its aggressive strategy.
vacuum_multixact_freeze_table_age also controls
when VACUUM
uses its aggressive strategy. This is an
independent Multixact ID based trigger for aggressive
VACUUM
, which works just like
vacuum_freeze_table_age
. It is applied against
mxid_age(relminmxid)
at the start of each
VACUUM
.
It doesn't matter if it was vacuum_freeze_table_age
or
vacuum_multixact_freeze_table_age
that triggered
VACUUM
's decision to use its aggressive strategy.
Every aggressive VACUUM
will
advance relfrozenxid
and
relminmxid
by applying the same generic policy
that controls which pages are frozen.
The default vacuum_freeze_table_age
and
vacuum_multixact_freeze_table_age
settings are
relatively low values. The vacuum_freeze_table_age
and vacuum_freeze_min_age
defaults are intended to
limit the system to using only about 10% of the available transaction ID
space at any one time. This leaves the system with a generous amount of
“slack capacity” that allows XID allocations to continue in
the event of unforeseen problems with autovacuum and/or the application.
There might only be a negligible benefit from higher settings that aim to
reduce the number of VACUUM
s that use the aggressive
strategy, in any case. Some applications may even
benefit from tuning that makes autovacuum perform
aggressive VACUUM
s more often. If individual
aggressive VACUUM
s can perform significantly less
“catch-up” freezing as a result, overall transaction
processing throughput is likely to be more stable and predictable.
To ensure that every table has its
relfrozenxid
(and
relminmxid
) advanced at regular intervals,
even in the case of completely static tables, autovacuum runs against any
table when it attains an age considered too far in the past. These are
anti-wraparound autovacuums. In practice, all
anti-wraparound autovacuums will use VACUUM
's
aggressive strategy (if they didn't, it would defeat the whole purpose of
anti-wraparound autovacuuming).
autovacuum_freeze_max_age controls when the
autovacuum daemon launches anti-wraparound autovacuums. If the
age(relfrozenxid)
of a table exceeds
autovacuum_freeze_max_age
when the autovacuum daemon
periodically examines the database (which happens once every autovacuum_naptime seconds), then an anti-wraparound
autovacuum is launched against the table.
autovacuum_multixact_freeze_max_age also controls
when the autovacuum daemon launches anti-wraparound autovacuums. It is
an independent Multixact ID based trigger for anti-wraparound
autovacuuming. If the mxid_age(relminmxid)
of a table
exceeds autovacuum_multixact_freeze_max_age
when the
autovacuum daemon periodically examines the database
[20], then an anti-wraparound autovacuum is launched against the
table.
Use of VACUUM
's aggressive strategy during
anti-wraparound autovacuuming is certain, because the effective value of
vacuum_freeze_table_age
is silently limited to an
effective value no greater than 95% of the current value of
autovacuum_freeze_max_age
. Similarly, the effective
value of vacuum_multixact_freeze_table_age
is silently
limited to a value no greater than 95% of the current value of
autovacuum_multixact_freeze_max_age
.
It doesn't matter if it was autovacuum_freeze_max_age
or autovacuum_multixact_freeze_max_age
that triggered
an anti-wraparound autovacuum. Every
anti-wraparound autovacuum will be an aggressive
VACUUM
, and will therefore advance
relfrozenxid
and
relminmxid
by applying the same generic policy
that controls which pages are frozen.
Anti-wraparound autovacuums are intended for static (and mostly static)
tables. There is no reason to expect that a table receiving continual
row inserts and/or row modifications will ever require an anti-wraparound
autovacuum. As a rule of thumb,
autovacuum_freeze_max_age
should be set to a value
somewhat above vacuum_freeze_table_age
, so that there
is a long window during which any autovacuum triggered by inserts,
updates, or deletes (or any manually issued VACUUM
)
will become an aggressive VACUUM
. This has the
advantage of allowing aggressive vacuuming to take place at a time when
vacuuming was required anyway. Each aggressive VACUUM
can therefore be expected to perform just as much useful work on
recovering disk space as an equivalent non-aggressive
VACUUM
would have (had the non-aggressive strategy
been chosen instead).
Aggressive VACUUM
is a special type of
VACUUM
. It must advance
relfrozenxid
up to a value that was no
greater than vacuum_freeze_min_age
in age as of the
start of the VACUUM
operation.
Anti-wraparound autovacuum is a special type of autovacuum. Its purpose
is to ensure that relfrozenxid
advances when
no earlier VACUUM
could advance it in passing —
often because no VACUUM
has run against the table for
an extended period.
There is only one runtime behavioral difference between anti-wraparound
autovacuums and other autovacuums that run aggressive
VACUUM
s: anti-wraparound autovacuums cannot
be autocancelled. This means that autovacuum workers that
perform anti-wraparound autovacuuming do not yield to conflicting
relation-level lock requests (e.g., from ALTER
TABLE
). See Section 25.1.3 for
a full explanation.
In practice, anti-wraparound autovacuum is very likely to be the type of
autovacuum that updates the oldest relfrozenxid
in each database to a more recent value due to the presence of completely
static tables [21]. As discussed in Section 25.2.2.1,
datfrozenxid
only advances when the oldest
relfrozenxid
in the database advances
(relminmxid
likewise only advances when the
earliest datminmxid
in the database advances).
This implies that anti-wraparound autovacuum is also
very likely to be involved when any database's
datfrozenxid
/datminmxid
advances (and when the cluster-wide earliest unfrozen transaction
ID/Multixact ID is advanced to a more recent value, in turn).
It follows that autovacuum_freeze_max_age
is usually
the limiting factor for advancing the cluster-wide oldest unfrozen
transaction ID found in pg_control
(the cluster-wide oldest unfrozen Multixact ID might occasionally be
influenced by autovacuum_multixact_freeze_max_age
,
too). This usually isn't much of a concern in itself, since it generally
doesn't predict anything about how far behind autovacuum is with freezing
physical heap pages. Note, however, that this effect
can significantly impact the amount of space
required to store transaction status information. The oldest transaction
status information (which is stored in external structures such as
pg_xact
) cannot safely be truncated until
VACUUM
can ascertain that there are no references to
the oldest entries remaining in any table, from any database. See Section 25.2.4 for further details.
If for some reason autovacuum fails to advance any table's
relfrozenxid
for an extended period (during
which transaction IDs continue to be allocated), the system will begin to
emit warning messages once the database's oldest XIDs attain an age
within forty million transactions of the 2.1 billion XID hard limit
described in Section 25.2.2.1. For example:
WARNING: database "mydb" must be vacuumed within 39985967 transactions HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.
(A manual VACUUM
should fix the problem, as suggested by the
hint; but note that the VACUUM
must be performed by a
superuser, else it will fail to process system catalogs and thus not
be able to advance the database's datfrozenxid
.)
If these warnings are ignored, the system will eventually refuse
to allocate new transaction IDs. This happens at the point that
there are fewer than three million transactions left:
ERROR: database is not accepting commands to avoid wraparound data loss in database "mydb" HINT: Stop the postmaster and vacuum that database in single-user mode.
The three-million-transaction safety margin exists to let the
administrator recover without data loss, by manually executing the
required VACUUM
commands. However, since the system will not
execute commands once it has gone into the safety shutdown mode,
the only way to do this is to stop the server and start the server in single-user
mode to execute VACUUM
. The shutdown mode is not enforced
in single-user mode. See the postgres reference
page for details about using single-user mode.
A similar safety mechanism is used to prevent Multixact ID allocations
when any table's relminmxid
is dangerously far
in the past: Multixact ID exhaustion. If the system isn't experiencing
transaction ID exhaustion, Multixact ID exhaustion can be fixed
non-invasively by running a manual VACUUM
without
entering single-user mode (otherwise follow the procedure for transaction
ID exhaustion). See Section 28.3 for details
on how to determine which table's relminmxid
is dangerously far in the past.
Autovacuum has two different mechanisms that are designed to avoid
transaction ID exhaustion. The first mechansim is anti-wraparound
autovacuuming. There is an second, independent mechanism, used when
relfrozenxid
and/or
relminmxid
have already consumed a
significant fraction of the total available transaction ID space: the
failsafe.
The failsafe is triggered by VACUUM
when the table's
relfrozenxid
attains an age of vacuum_failsafe_age XIDs, or when the table's
relminmxid
attains an age of vacuum_multixact_failsafe_age Multixact IDs. This
happens dynamically, when the risk of eventual transaction ID (or
Multixact ID) exhaustion is deemed to outweigh the risks of not
proceeding as planned with ordinary vacuuming.
Once the failsafe triggers, VACUUM
prioritizes
advancing relfrozenxid
and/or
relminmxid
to avoid transaction ID
exhaustion. Most notably, VACUUM
bypasses any
remaining non-essential maintenance, such as index vacuuming.
Vacuum maintains a visibility map for each table to keep track of which pages contain only tuples that are known to be visible to all active transactions (and all future transactions, until the page is again modified). A separate bit tracks whether all of the tuples are frozen.
The visibility map serves two purposes.
First, VACUUM
itself can skip such pages on the
next run, since there is nothing to clean up. Even aggressive VACUUM
s
can skip pages that are both all-visible and all-frozen.
Second, it allows PostgreSQL to answer some queries using only the index, without reference to the underlying table. Since PostgreSQL indexes don't contain tuple visibility information, a normal index scan fetches the heap tuple for each matching index entry, to check whether it should be seen by the current transaction. An index-only scan, on the other hand, checks the visibility map first. If it's known that all tuples on the page are visible, the heap fetch can be skipped. This is most useful on large data sets where the visibility map can prevent disk accesses. The visibility map is vastly smaller than the heap, so it can easily be cached even when the heap is very large.
As discussed in Section 25.2.2.1, aggressive
autovacuuming plays a critical role in maintaining the XID address space
for the system as a whole. A secondary goal of this whole process is to
enable eventual truncation of the oldest transaction status information in
the database cluster
as a whole. This status information is stored in dedicated simple least-recently-used
(SLRU) caches backed by external storage (see Section 73.1). Truncation is only possible when
VACUUM
can ascertain that there are no references to
the oldest entries remaining in any table, from any database, by taking
the earliest datfrozenxid
and
datminmxid
among all databases in the cluster.
This isn't a maintenance task that affects individual tables; it's a
maintenance task that affects the whole cluster.
The space required to store transaction status information is likely to be
a low priority for most database administrators. It may occasionally be
useful to limit the maximum storage overhead used for transaction status
information by making anti-wraparound autovacuums happen more frequently.
The frequency of system-wide anti-wraparound autovacuuming increases when
autovacuum_freeze_max_age
and
autovacuum_multixact_freeze_max_age
are decreased in
postgresql.conf
. This approach is effective (at
limiting the storage required for transaction status information) because
the oldest datfrozenxid
and datminmxid
in the cluster are very likely
to depend on the frequency of anti-wraparound autovacuuming of completely
static tables. See Section 25.2.2.2 for
further discussion of the role of anti-wraparound autovacuuming in
advancing the cluster-wide oldest unfrozen transaction ID.
There are two SLRU storage areas associated with
transaction IDs. First, there is pg_xact
, which
stores commit/abort status information. Second, there is
pg_commit_ts
, which stores transaction commit
timestamps (when track_commit_timestamp is set to
on
). The default
autovacuum_freeze_max_age
setting of 200 million
transactions translates to about 50MB of pg_xact
storage, and about 2GB of pg_commit_ts
storage when
track_commit_timestamp
is enabled (it is set to
off
by default, which totally avoids the need to store
anything in pg_commit_ts
).
There are also two SLRU storage areas associated with
Multixact IDs: pg_multixact/members
, and
pg_multixact/offsets
. These are logically one
storage area, implemented as two distinct storage areas. There is no
simple formula to determine the storage overhead per Multixact ID, since
Multixact IDs have a variable number of member transaction IDs (this is
what necessitates using two different physical storage areas). Note,
however, that if pg_multixact/members
exceeds 2GB,
the effective value of autovacuum_multixact_freeze_max_age
used by autovacuum (and VACUUM
) will be lower. This
results in more frequent
anti-wraparound
autovacuums (since that's the only approach that reliably limits
the size of these storage areas). It might also increase the frequency of
aggressive VACUUM
s more generally.
The PostgreSQL query planner relies on
statistical information about the contents of tables in order to
generate good plans for queries. These statistics are gathered by
the ANALYZE
command,
which can be invoked by itself or
as an optional step in VACUUM
. It is important to have
reasonably accurate statistics, otherwise poor choices of plans might
degrade database performance.
The autovacuum daemon, if enabled, will automatically issue
ANALYZE
commands whenever the content of a table has
changed sufficiently. However, administrators might prefer to rely
on manually-scheduled ANALYZE
operations, particularly
if it is known that update activity on a table will not affect the
statistics of “interesting” columns. The daemon schedules
ANALYZE
strictly as a function of the number of rows
inserted or updated; it has no knowledge of whether that will lead
to meaningful statistical changes.
Tuples changed in partitions and inheritance children do not trigger
analyze on the parent table. If the parent table is empty or rarely
changed, it may never be processed by autovacuum, and the statistics for
the inheritance tree as a whole won't be collected. It is necessary to
run ANALYZE
on the parent table manually in order to
keep the statistics up to date.
As with vacuuming for space recovery, frequent updates of statistics
are more useful for heavily-updated tables than for seldom-updated
ones. But even for a heavily-updated table, there might be no need for
statistics updates if the statistical distribution of the data is
not changing much. A simple rule of thumb is to think about how much
the minimum and maximum values of the columns in the table change.
For example, a timestamp
column that contains the time
of row update will have a constantly-increasing maximum value as
rows are added and updated; such a column will probably need more
frequent statistics updates than, say, a column containing URLs for
pages accessed on a website. The URL column might receive changes just
as often, but the statistical distribution of its values probably
changes relatively slowly.
It is possible to run ANALYZE
on specific tables and even
just specific columns of a table, so the flexibility exists to update some
statistics more frequently than others if your application requires it.
In practice, however, it is usually best to just analyze the entire
database, because it is a fast operation. ANALYZE
uses a
statistically random sampling of the rows of a table rather than reading
every single row.
Although per-column tweaking of ANALYZE
frequency might not be
very productive, you might find it worthwhile to do per-column
adjustment of the level of detail of the statistics collected by
ANALYZE
. Columns that are heavily used in WHERE
clauses and have highly irregular data distributions might require a
finer-grain data histogram than other columns. See ALTER TABLE
SET STATISTICS
, or change the database-wide default using the default_statistics_target configuration parameter.
Also, by default there is limited information available about the selectivity of functions. However, if you create a statistics object or an expression index that uses a function call, useful statistics will be gathered about the function, which can greatly improve query plans that use the expression index.
The autovacuum daemon does not issue ANALYZE
commands for
foreign tables, since it has no means of determining how often that
might be useful. If your queries require statistics on foreign tables
for proper planning, it's a good idea to run manually-managed
ANALYZE
commands on those tables on a suitable schedule.
The autovacuum daemon does not issue ANALYZE
commands
for partitioned tables. Inheritance parents will only be analyzed if the
parent itself is changed - changes to child tables do not trigger
autoanalyze on the parent table. If your queries require statistics on
parent tables for proper planning, it is necessary to periodically run
a manual ANALYZE
on those tables to keep the statistics
up to date.
[15]
In this regard, freezing is unlike setting transaction status
“hint bits” in tuple headers: setting hint bits doesn't
usually need to be WAL-logged, and can take place on
physical replication standby servers without input from the primary
server. Hint bits exist to allow query execution to avoid repeated
pg_xact
lookups for the same tuples, strictly as an
optimization. On the other hand, freezing exists because the system
needs to reliably remove pg_xact
dependencies from
individual tuples.
[16]
“Freezing” of xmax
fields
(whether they contain an XID or a Multixact ID) generally means clearing
xmax
from a tuple header.
VACUUM
may occasionally encounter an individual
Multixact ID that must be removed to advance the table's
relminmxid
by the required amount, which can
only be processed by generating a replacement Multixact ID (containing
just the non-removable subset of member XIDs from the original Multixact
ID), and then setting xmax
to the
new/replacement Multixact ID value.
[17]
Actually, the “freeze on an FPI write”
mechanism isn't just used when VACUUM
needs to
generate an FPI (as torn page protection) for
inclusion in a WAL record describing how dead tuples
were removed. The FPI mechanism also triggers when
hint bits are set by VACUUM
, if and only if it
necessitates writing an FPI. The need to write a
WAL record to set hint bits only arises when
wal_log_hints is enabled in
postgresql.conf
, or when data checksums were
enabled when the cluster was initialized with initdb.
[18]
Autovacuum's log reports appear in the server log for autovacuums
whose VACUUM
takes longer than a threshold
controlled by log_autovacuum_min_duration.
Manual VACUUM
s output the same details as
INFO
messages when the VACUUM
command's VERBOSE
option is used (note that manual
VACUUM
s never generate reports in the server log).
[19]
Aggressive VACUUM
is (somewhat arbitrarily) required
to freeze all pages containing transaction IDs older than
vacuum_freeze_min_age
and/or Multixact ID values
older than vacuum_multixact_freeze_min_age
, at a
minimum.
[20]
Autovacuum may use a lower “effective” Multixact ID age
than the autovacuum_multixact_freeze_max_age
setting
in postgresql.conf
, though. Applying a lower
“effective” value like this avoids allowing the
pg_multixact/members
SLRU
storage area to continue to grow in size for long, once its size
reaches 2GB
.
See Section 25.2.4.
[21]
Anti-wraparound autovacuum is all but guaranteed to advance the oldest
relfrozenxid
/relminmxid
(and therefore to advance
datfrozenxid
/datminmxid
)
in practice because in practice there is all but guaranteed to be at
least one totally static table that never gets an aggressive
VACUUM
for any other reason (often just a tiny,
completely static system catalog table).