Vacuum statistics

From: Alena Rybakina <lena(dot)ribackina(at)yandex(dot)ru>
To: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andrei Zubkov <zubkov(at)moonset(dot)ru>, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Subject: Vacuum statistics
Date: 2024-05-30 17:33:51
Message-ID: bee34abe-fe87-49b3-8523-8ae19e641ccf@yandex.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello, everyone!

I think we don't have enough information to analyze vacuum functionality.

Needless to say that the vacuum is the most important process for a
database system. It prevents problems like table and index bloating and
emergency freezing if we have a wraparound problem. Furthermore, it
keeps the visibility map up to date. On the other hand, because of
incorrectly adjusted aggressive settings of autovacuum it can consume a
lot of computing resources that lead to all queries to the system
running longer.

Nowadays the vacuum gathers statistical information about tables, but it
is important not for optimizer only.

Because the vacuum is an automation process, there are a lot of settings
that determine their aggressive functionality to other objects of the
database. Besides, sometimes it is important to set a correct parameter
for the specified table, because of its dynamic changes.

An administrator of a database needs to set the settings of autovacuum
to have a balance between the vacuum's useful action in the database
system on the one hand, and the overhead of its workload on the other.
However, it is not enough for him to decide on vacuum functionality
through statistical information about the number of vacuum passes
through tables and operational data from progress_vacuum, because it is
available only during vacuum operation and does not provide a strategic
overview over the considered period.

To sum up, an automation vacuum has a strategic behavior because the
frequency of its functionality and resource consumption depends on the
workload of the database. Its workload on the database is minimal for an
append-only table and it is a maximum for the table with a
high-frequency updating. Furthermore, there is a high dependence of the
vacuum load on the number and volume of indexes. Because of the absence
of the visibility map for indexes, the vacuum scans the index
completely, and the worst situation when it needs to do it during a
bloating index situation in a small table.

I suggest gathering information about vacuum resource consumption for
processing indexes and tables and storing it in the table and index
relationships (for example, PgStat_StatTabEntry structure like it has
realized for usual statistics). It will allow us to determine how well
the vacuum is configured and evaluate the effect of overhead on the
system at the strategic level, the vacuum has gathered this information
already, but this valuable information doesn't store it.

--
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Christensen 2024-05-30 17:59:06 Re: [PATCH] psql: \dn+ to show size of each schema (and \dA+ for AMs)
Previous Message Andrew Dunstan 2024-05-30 17:28:26 Re: The xversion-upgrade test fails to stop server