PostgreSQL databases require periodic
maintenance known as vacuuming, and require
periodic updates to the statistics used by the
PostgreSQL query planner. These
maintenance tasks are performed by the VACUUM
and ANALYZE
commands
respectively. For most installations, it is sufficient to let the
autovacuum daemon determine when to perform
these maintenance tasks (which is partly determined by configurable
table-level thresholds; see Section 25.1).
The autovacuum daemon has to process each table on a regular basis for several reasons:
VACUUM
operation avoid needlessly scanning pages that are already
frozen
Maintenance work within the scope of items 1, 2, 3, and 4 is
performed by the VACUUM
command internally.
Item 5 (maintenance of planner statistics) is handled by the
ANALYZE
command internally. Although this
section presents information about autovacuum, there is no
difference between manually-issued VACUUM
and
ANALYZE
commands and those run by the autovacuum
daemon (though there are autovacuum-specific variants of a small
number of settings that control VACUUM
).
Autovacuum creates a substantial amount of I/O traffic, which can cause poor performance for other active sessions. There are configuration parameters that can be adjusted to reduce the performance impact of background vacuuming. See the autovacuum-specific cost delay settings described in Section 20.10, and additional cost delay settings described in Section 20.4.4.
Some database administrators will want to supplement the daemon's
activities with manually-managed VACUUM
commands, which typically are executed according to a schedule by
cron or Task
Scheduler scripts. It can be useful to perform
off-hours VACUUM
commands during periods where
reduced load is expected.
In PostgreSQL, an
UPDATE
or DELETE
of a row does not
immediately remove the old version of the row.
This approach is necessary to gain the benefits of multiversion
concurrency control (MVCC, see Chapter 13): the row version
must not be deleted while it is still potentially visible to other
transactions. A deleted row version (whether from an
UPDATE
or DELETE
) will
usually cease to be of interest to any still running transaction
shortly after the original deleting transaction commits.
The space dead tuples occupy must eventually be reclaimed for
reuse by new rows, to avoid unbounded growth of disk space
requirements. Reclaiming space from dead rows is
VACUUM
's main responsibility.
The XID cutoff point that VACUUM
uses to
determine whether or not a deleted tuple is safe to physically
remove is reported under removable cutoff
in
the server log when autovacuum logging (controlled by log_autovacuum_min_duration) reports on a
VACUUM
operation executed by autovacuum.
Tuples that are not yet safe to remove are counted as
dead but not yet removable
tuples in the log
report. VACUUM
establishes its
removable cutoff
once, at the start of the
operation. Any older snapshot (or transaction that allocates an
XID) that's still running when the cutoff is established may hold
it back.
It's important that no long running transactions ever be allowed
to hold back every VACUUM
operation's cutoff
for an extended period. You may wish to monitor this.
Tuples inserted by aborted transactions can be removed by
VACUUM
immediately
VACUUM
will not return space to the operating
system, except in the special case where a group of contiguous
pages at the end of a table become entirely free and an exclusive
table lock can be easily obtained. This relation truncation
behavior can be disabled in tables where the exclusive lock is
disruptive by setting the table's vacuum_truncate
storage parameter to off
.
If you have a table whose entire contents are deleted on a
periodic basis, consider doing it with TRUNCATE
rather
than relying on VACUUM
.
TRUNCATE
removes the entire contents of the
table immediately, avoiding the need to set
xmax
to the deleting transaction's XID.
One disadvantage is that strict MVCC semantics are violated.
VACUUM FULL
or CLUSTER
can
be useful when dealing with extreme amounts of dead tuples. It
can reclaim more disk space, but runs much more slowly. It
rewrites an entire new copy of the table and rebuilds all indexes.
This typically has much higher overhead than
VACUUM
. Generally, therefore, administrators
should avoid using VACUUM FULL
except in the
most extreme cases.
Although VACUUM FULL
is technically an option
of the VACUUM
command, VACUUM
FULL
uses a completely different implementation.
VACUUM FULL
is essentially a variant of
CLUSTER
. (The name VACUUM
FULL
is historical; the original implementation was
somewhat closer to standard VACUUM
.)
TRUNCATE
, VACUUM FULL
, and
CLUSTER
all require an ACCESS
EXCLUSIVE
lock, which can be highly disruptive
(SELECT
, INSERT
,
UPDATE
, and DELETE
commands
will all be blocked).
VACUUM FULL
(and CLUSTER
)
temporarily uses extra disk space approximately equal to the size
of the table, since the old copies of the table and indexes can't
be released until the new ones are complete.
VACUUM
often marks certain pages
frozen, indicating that all eligible rows on
the page were inserted by a transaction that committed
sufficiently far in the past that the effects of the inserting
transaction are certain to be visible to all current and future
transactions. The specific transaction ID number
(XID) stored in a frozen heap
row's xmin
field is no longer needed to
determine anything about the row's visibility.
Furthermore, when a row undergoing freezing happens to have an XID
set in its xmax
field (possibly an XID
left behind by an earlier SELECT FOR UPDATE
row
locker), the xmax
field's XID is
typically also removed (actually, xmax
is set to the special XID value 0
, also known
as InvalidTransactionId
). See Section 73.6.1.1 for further background
information.
Once frozen, heap pages are “self-contained”. All of
the page's rows can be read by every transaction, without any
transaction ever needing to consult externally stored transaction
status metadata (most notably, transaction commit/abort status
information from pg_xact
won't ever be
required).
It can be useful for VACUUM
to put off some of
the work of freezing, but freezing cannot be put off indefinitely.
Since on-disk storage of transaction IDs in heap row headers uses
a truncated 32-bit representation to save space (rather than the
full 64-bit representation), freezing plays a crucial role in
enabling management of the XID
address space by VACUUM
. If freezing
by VACUUM
is somehow impeded (in a database
that continues to allocate new transaction IDs), the system will
eventually refuse to allocate new
transaction IDs. This generally only happens in extreme
cases where the system has been misconfigured.
vacuum_freeze_min_age can be used to control
when freezing takes place. When VACUUM
scans a
heap page containing even one XID that already attained an age
exceeding this value, the page is frozen.
Multixact IDs are used to support row
locking by multiple transactions. Since there is only limited
space in a tuple header to store lock information, that
information is encoded as a “multiple transaction
ID”, or multixact ID for short, whenever there is more
than one transaction concurrently locking a row. Information
about which transaction IDs are included in any particular
multixact ID is stored separately, and only the multixact ID
appears in the xmax
field in the tuple
header. Like transaction IDs, multixact IDs are implemented as a
32-bit counter and corresponding storage. Since MultiXact IDs are
stored in the xmax
field of heap rows
(and have an analogous dependency on external transaction status
information), they may also need to be removed during freezing.
vacuum_multixact_freeze_min_age also
controls when freezing takes place. It is analogous to
vacuum_freeze_min_age
, but “age”
is expressed in units of Multixact ID (not in units of XID).
vacuum_multixact_freeze_min_age
typically has
only a minimal impact on how many pages are frozen, partly because
VACUUM
usually prefers to remove MultiXact IDs
proactively based on low-level considerations around the cost of
freezing. vacuum_multixact_freeze_min_age
forces VACUUM
to process
MultiXact IDs in certain rare cases where the implementation would
not ordinarily do so.
Managing the added WAL volume from freezing
over time is an important consideration for
VACUUM
. This is why VACUUM
doesn't just freeze every eligible tuple at the earliest
opportunity: the WAL written to freeze a page's
tuples “goes to waste” in cases where the resulting
frozen tuples are soon deleted or updated anyway. It's also why
VACUUM
will freeze all
eligible tuples from a heap page once the decision to freeze at
least one tuple is taken: at that point the added cost to freeze
all eligible tuples eagerly (measured in “extra bytes of
WAL written”) is far lower than the
probable cost of deferring freezing until a future
VACUUM
operation against the same table.
Furthermore, once the page is frozen it can be marked all-frozen
in the visibility map right away.
In PostgreSQL versions before 16,
freezing was triggered at the level of individual
xmin
and
xmax
fields. Freezing only affected
the exact XIDs that had already attained an age at or exceeding
vacuum_freeze_min_age
, regardless of costs.
VACUUM
also triggers freezing of pages in cases
where it already proved necessary to write out an
FPI (full page image) alongside a
WAL record generated while removing dead tuples
(see Section 30.1 for background information
about how FPIs provide torn page protection).
This “freeze on an FPI write”
mechanism is designed to lower the absolute volume of
WAL written over time by
VACUUM
, across multiple
VACUUM
operations against the same table. The
mechanism often prevents future VACUUM
operations from having to write a second FPI
for the same page much later on. In effect,
VACUUM
writes slightly more
WAL in the short term with the aim of
ultimately needing to write much less WAL in
the long term.
VACUUM
may not be able to freeze every tuple's
xmin
in relatively rare cases. The
criteria that determines basic eligibility for freezing is exactly
the same as the one that determines if a deleted tuple should be
considered removable
or merely dead
but not yet removable
(namely, the XID-based
removable cutoff
). In extreme cases a
long-running transaction can hold back every
VACUUM
's removable cutoff for so long that the
system is eventually forced to activate xidStopLimit
mode
protections.
VACUUM
's aggressive strategy #
As already noted briefly in the introductory section, freezing
doesn't just allow queries to avoid lookups of subsidiary
transaction status information in structures such as
pg_xact
. Freezing also plays a crucial role
in enabling management of the XID address space by
VACUUM
.
VACUUM
maintains information about the oldest
unfrozen XID that remains in the table when it uses its
aggressive strategy. This information is
stored in the pg_class
system table at
the end of each aggressive VACUUM
: the table
processed by aggressive VACUUM
has its
pg_class
.relfrozenxid
updated (relfrozenxid
“advances” by a certain number of XIDs). Similarly,
the datfrozenxid
column of a
database's pg_database
row is a lower
bound on the unfrozen XIDs appearing in that database — it
is just the minimum of the per-table
relfrozenxid
values within the
database. The system also maintains
pg_class
.relminmxid
and
pg_database
.datminmxid
fields to track the oldest MultiXact ID, while following
analogous rules.
When the VACUUM
command's
VERBOSE
parameter is specified,
VACUUM
prints various statistics about the
table. This includes information about how
relfrozenxid
and
relminmxid
advanced, and the number
of newly frozen pages. The same details appear in the server
log when autovacuum logging (controlled by log_autovacuum_min_duration) reports on a
VACUUM
operation executed by autovacuum.
This process is intended to reliably prevent the entire database
from ever having a transaction ID that is excessively far in the
past. The maximum “distance” that the system can
tolerate between the oldest unfrozen transaction ID and the next
(unallocated) transaction ID is about 2.1 billion transaction
IDs. That is an upper limit; the greatest
age(relfrozenxid)
/age(datfrozenxid)
in the system should ideally never exceed a fraction of that
upper limit. If that upper limit is ever reached, then the
system will activate xidStopLimit
mode
protections. These protections will remain in force
until VACUUM
(typically autovacuum) manages to
advance the oldest datfrozenxid
in the cluster
(by advancing that database's oldest relfrozenxid via an
aggressive VACUUM
).
The 2.1 billion XIDs “distance” invariant is a
consequence of the fact that on-disk storage of transaction IDs
in heap row headers uses a truncated 32-bit representation to
save space (rather than the full 64-bit representation). Since
all unfrozen transaction IDs from heap tuple headers
must be from the same transaction ID epoch
(which is what the invariant actually assures), there isn't any
need to store a separate epoch field in each tuple header. The
downside is that the system depends on freezing (and
relfrozenxid
advancement during
aggressive VACUUM
s) to make sure that the
“available supply” of transaction IDs never exceeds
the “demand”.
In practice most tables require periodic aggressive vacuuming.
However, some individual non-aggressive
VACUUM
operations may be able to advance
relfrozenxid
and/or
relminmxid
. This is most common in
small, frequently modified tables, where
VACUUM
happens to scan all pages (or at least
all pages not marked all-frozen in the visibility map) in the
course of removing dead tuples.
VACUUM
/autovacuum also use vacuum_multixact_freeze_table_age and autovacuum_multixact_freeze_max_age settings as
independent Multixact ID orientated controls for aggressive mode
VACUUM
and anti-wraparound autovacuum.
These work analogously to the XID-based
vacuum_freeze_table_age
and
autovacuum_freeze_max_age
, respectively.
Note, however, that if the multixacts members storage
area exceeds 2GB, then the effective value of
autovacuum_multixact_freeze_max_age
will be
lower, resulting in more frequent aggressive mode VACUUMs.
There is only one major runtime behavioral differences between
aggressive mode VACUUM
and non-aggressive
(standard) VACUUM
. Both kinds of
VACUUM
use the visibility map to determine which
pages of a table must be scanned, and which can be skipped.
However, only non-aggressive VACUUM
will skip
pages that don't have any dead row versions even if those pages
might still have row versions with old XID values; aggressive
VACUUM
s are limited to skipping pages already
marked all-frozen (and all-visible).
As a consequence of all this, non-aggressive
VACUUM
s usually won't freeze
every page with an old row version in the
table. Most individual tables will eventually need an aggressive
VACUUM
, which will reliably freeze all pages
with XID and MXID values older than
vacuum_freeze_min_age
, including those from
all-visible but not all-frozen pages (and then update
pg_class
). vacuum_freeze_table_age controls when
VACUUM
must use its aggressive strategy.
Since the setting is applied against
age(relfrozenxid)
, settings like
vacuum_freeze_min_age
may influence the exact
cadence of aggressive vacuuming. Setting
vacuum_freeze_table_age
to 0 forces
VACUUM
to always use its aggressive strategy.
Aggressive VACUUM
s apply the same rules for
freezing as non-aggressive VACUUM
s. You may
nevertheless notice that aggressive VACUUM
s
perform a disproportionately large amount of the total required
freezing in larger tables.
This is an indirect consequence of the fact that non-aggressive
VACUUM
s won't scan pages that are marked
all-visible but not also marked all-frozen in the visibility
map. VACUUM
can only consider freezing those
pages that it actually gets to scan.
Note in particular that vacuum_freeze_min_age
isn't very likely to trigger freezing in non-aggressive
VACUUM
s, at least with default settings. The
“freeze on an FPI write”
mechanism is somewhat more likely to trigger in non-aggressive
VACUUM
s in practice, though. Much depends on
workload characteristics.
To ensure that every table has its
relfrozenxid
advanced at somewhat
regular intervals, including totally static tables, autovacuum is
invoked on any table that might contain unfrozen rows with XIDs
older than the age specified by the configuration parameter autovacuum_freeze_max_age. This will happen
even if autovacuum is disabled.
In practice all anti-wraparound autovacuums will use
VACUUM
's aggressive strategy. This is assured
because the effective value of
vacuum_freeze_table_age
is
“clamped” to a value no greater than 95% of the
current value of autovacuum_freeze_max_age
.
As a rule of thumb, vacuum_freeze_table_age
should be set to a value somewhat below
autovacuum_freeze_max_age
, leaving enough gap
so that a regularly scheduled VACUUM
or an
autovacuum triggered by inserts, updates and deletes is run in
that window. Anti-wraparound autovacuums can be avoided
altogether in tables that reliably receive
some VACUUM
s that use the
aggressive strategy.
A convenient way to examine information about
relfrozenxid
and
relminmxid
is to execute queries such as:
SELECT c.oid::regclass as table_name, greatest(age(c.relfrozenxid), age(t.relfrozenxid)) as xid_age, mxid_age(c.relminmxid) FROM pg_class c LEFT JOIN pg_class t ON c.reltoastrelid = t.oid WHERE c.relkind IN ('r', 'm'); SELECT datname, age(datfrozenxid) as xid_age, mxid_age(datminmxid) FROM pg_database;
The age
column measures the number of
transactions from the cutoff XID to the next unallocated
transactions ID. The mxid_age
column
measures the number of MultiXactIds from the cutoff MultiXactId
to the next unallocated multixact ID.
xidStopLimit
mode #
If for some reason autovacuum utterly fails to advance any
table's relfrozenxid
or
relminmxid
for an extended period, and
if XIDs and/or MultiXactIds continue to be allocated, the system
will begin to emit warning messages like this when the database's
oldest XIDs reach forty million transactions from the wraparound
point:
WARNING: database "mydb" must be vacuumed within 39985967 transactions HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.
(A manual VACUUM
should fix the problem, as suggested by the
hint; but note that the VACUUM
must be performed by a
superuser, else it will fail to process system catalogs and thus not
be able to advance the database's datfrozenxid
.)
If these warnings are ignored, the system will eventually refuse
to start any new transactions. This happens at the point that
there are fewer than three million transactions left:
ERROR: database is not accepting commands to avoid wraparound data loss in database "mydb" HINT: Stop the postmaster and vacuum that database in single-user mode.
The three-million-transaction safety margin exists to let the
administrator recover without data loss, by manually executing the
required VACUUM
commands. However, since the system will not
execute commands once it has gone into the safety shutdown mode,
the only way to do this is to stop the server and start the server in single-user
mode to execute VACUUM
. The shutdown mode is not enforced
in single-user mode. See the postgres reference
page for details about using single-user mode.
Anything that influences when and how
relfrozenxid
and
relminmxid
advance will also directly
affect the high watermark storage overhead from storing a great
deal of historical transaction status information. The
additional space
overhead is usually of fairly minimal concern. It is
noted as an additional downside of allowing the system to get
close to xidStopLimit
for the sake of
completeness.
The term “wraparound” is inaccurate. Also, there is no “data loss” here — the message is simply wrong.
XXX: We really need to fix the situation with single user mode to put things on a good footing.
In emergencies, VACUUM
will take extraordinary
measures to avoid xidStopLimit
mode. A
failsafe mechanism is triggered when thresholds controlled by
vacuum_failsafe_age and vacuum_multixact_failsafe_age are reached. The
failsafe prioritizes advancing
relfrozenxid
and/or
relminmxid
as quickly as possible.
Once the failsafe triggers, VACUUM
bypasses
all remaining non-essential maintenance tasks, and stops applying
any cost-based delay that was in effect. Any Buffer Access
Strategy in use will also be disabled.
Vacuum maintains a visibility
map for each table to keep track of which pages contain
only tuples that are known to be visible to all active
transactions (and all future transactions, until the page is again
modified). This has two purposes. First, vacuum itself can skip
such pages on the next run, since there is nothing to clean up.
Even VACUUM
s that use the aggressive strategy can skip
pages that are both all-visible and all-frozen (the visibility map
keeps track of which pages are all-frozen separately).
Second, it allows PostgreSQL to answer some queries using only the index, without reference to the underlying table. Since PostgreSQL indexes don't contain tuple visibility information, a normal index scan fetches the heap tuple for each matching index entry, to check whether it should be seen by the current transaction. An index-only scan, on the other hand, checks the visibility map first. If it's known that all tuples on the page are visible, the heap fetch can be skipped. This is most useful on large data sets where the visibility map can prevent disk accesses. The visibility map is vastly smaller than the heap, so it can easily be cached even when the heap is very large.
As noted in Section 25.2.2.2, anything that
influences when and how relfrozenxid
and
relminmxid
advance will also directly
affect the high watermark storage overhead needed to store
historical transaction status information. For example,
increasing autovacuum_freeze_max_age
(and
vacuum_freeze_table_age
along with it) will
make the pg_xact
and
pg_commit_ts
subdirectories of the database
cluster take more space, because they store the commit status and
(if track_commit_timestamp
is enabled)
timestamp of all transactions back to the
datfrozenxid
horizon (the earliest
datfrozenxid
in the entire cluster).
The commit status uses two bits per transaction. The default
autovacuum_freeze_max_age
setting of 200
million transactions translates to about 50MB of
pg_xact
storage. When
track_commit_timestamp
is enabled, about 2GB of
pg_commit_ts
storage will also be required.
MultiXactId status information is implemented as two separate
SLRU storage areas:
pg_multixact/members
, and
pg_multixact/offsets
. There is no simple
formula to determine the storage overhead per MultiXactId, since
MultiXactIds have a variable number of member XIDs.
Truncating of transaction status information is only possible at
the end of VACUUM
s that advance
relfrozenxid
(in the case of
pg_xact
and
pg_commit_ts
) or
relminmxid
(in the case of
(pg_multixact/members
and
pg_multixact/offsets
) of whatever table
happened to have the oldest value in the cluster when the
VACUUM
began. This typically happens very
infrequently, often during aggressive strategy
VACUUM
s of one of the database's largest
tables.
The PostgreSQL query planner relies on
statistical information about the contents of tables in order to
generate good plans for queries. These statistics are gathered by
the ANALYZE
command,
which can be invoked by itself or
as an optional step in VACUUM
. It is important to have
reasonably accurate statistics, otherwise poor choices of plans might
degrade database performance.
The autovacuum daemon, if enabled, will automatically issue
ANALYZE
commands whenever the content of a table has
changed sufficiently. However, administrators might prefer to rely
on manually-scheduled ANALYZE
operations, particularly
if it is known that update activity on a table will not affect the
statistics of “interesting” columns. The daemon schedules
ANALYZE
strictly as a function of the number of rows
inserted or updated; it has no knowledge of whether that will lead
to meaningful statistical changes.
Tuples changed in partitions and inheritance children do not trigger
analyze on the parent table. If the parent table is empty or rarely
changed, it may never be processed by autovacuum, and the statistics for
the inheritance tree as a whole won't be collected. It is necessary to
run ANALYZE
on the parent table manually in order to
keep the statistics up to date.
As with vacuuming for space recovery, frequent updates of statistics
are more useful for heavily-updated tables than for seldom-updated
ones. But even for a heavily-updated table, there might be no need for
statistics updates if the statistical distribution of the data is
not changing much. A simple rule of thumb is to think about how much
the minimum and maximum values of the columns in the table change.
For example, a timestamp
column that contains the time
of row update will have a constantly-increasing maximum value as
rows are added and updated; such a column will probably need more
frequent statistics updates than, say, a column containing URLs for
pages accessed on a website. The URL column might receive changes just
as often, but the statistical distribution of its values probably
changes relatively slowly.
It is possible to run ANALYZE
on specific tables and even
just specific columns of a table, so the flexibility exists to update some
statistics more frequently than others if your application requires it.
In practice, however, it is usually best to just analyze the entire
database, because it is a fast operation. ANALYZE
uses a
statistically random sampling of the rows of a table rather than reading
every single row.
Although per-column tweaking of ANALYZE
frequency might not be
very productive, you might find it worthwhile to do per-column
adjustment of the level of detail of the statistics collected by
ANALYZE
. Columns that are heavily used in WHERE
clauses and have highly irregular data distributions might require a
finer-grain data histogram than other columns. See ALTER TABLE
SET STATISTICS
, or change the database-wide default using the default_statistics_target configuration parameter.
Also, by default there is limited information available about the selectivity of functions. However, if you create a statistics object or an expression index that uses a function call, useful statistics will be gathered about the function, which can greatly improve query plans that use the expression index.
The autovacuum daemon does not issue ANALYZE
commands for
foreign tables, since it has no means of determining how often that
might be useful. If your queries require statistics on foreign tables
for proper planning, it's a good idea to run manually-managed
ANALYZE
commands on those tables on a suitable schedule.
The autovacuum daemon does not issue ANALYZE
commands
for partitioned tables. Inheritance parents will only be analyzed if the
parent itself is changed - changes to child tables do not trigger
autoanalyze on the parent table. If your queries require statistics on
parent tables for proper planning, it is necessary to periodically run
a manual ANALYZE
on those tables to keep the statistics
up to date.