Introduce some randomness to autovacuum

From: Junwang Zhao <zhjwpku(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Introduce some randomness to autovacuum
Date: 2025-04-25 14:02:49
Message-ID: CAEG8a3+3fwQbgzak+h3Q7Bp=vK_aWhw1X7w7g5RCgEW9ufdvtA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

After watching Robert's talk[1] on autovacuum and participating in the related
workshop yesterday, it appears that people are inclined to use prioritization
to address the issues highlighted in Robert's presentation. Here I list two
of the failure modes that were discussed.

- Spinning. Running repeatedly on the same table but not accomplishing
anything useful.
- Starvation. autovacuum can't vacuum everything that needs vacuuming.
- ...

The prioritization way needs some basic stuff that postgres doesn't have now.

I had a random thought that introducing some randomness might help
mitigate some of the issues mentioned above. Before performing vacuum
on the collected tables, we could rotate the table_oids list by a random
number within the range [0, list_length(table_oids)]. This way, every table
would have an equal chance of being vacuumed first, thus no spinning and
starvation.

Even if there is a broken table that repeatedly gets stuck, this random
approach would still provide opportunities for other tables to be vacuumed.
Eventually, the system would converge.

The change is something like the following, I haven't tested the code,
just posted it here for discussion, let me know your thoughts.

diff --git a/src/backend/postmaster/autovacuum.c
b/src/backend/postmaster/autovacuum.c
index 16756152b71..6dddd273d22 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -79,6 +79,7 @@
#include "catalog/pg_namespace.h"
#include "commands/dbcommands.h"
#include "commands/vacuum.h"
+#include "common/pg_prng.h"
#include "common/int.h"
#include "lib/ilist.h"
#include "libpq/pqsignal.h"
@@ -2267,6 +2268,25 @@ do_autovacuum(void)

"Autovacuum Portal",

ALLOCSET_DEFAULT_SIZES);

+ /*
+ * Randomly rotate the list of tables to vacuum. This is to avoid
+ * always vacuuming the same table first, which could lead to spinning
+ * on the same table or vacuuming starvation.
+ */
+ if (list_length(table_oids) > 2)
+ {
+ int rand = 0;
+ static pg_prng_state prng_state;
+ List *tmp_oids = NIL;
+
+ pg_prng_seed(&prng_state, (uint64) (getpid() ^ time(NULL)));
+ rand = (int) pg_prng_uint64_range(&prng_state, 0,
list_length(table_oids) - 1);
+ if (rand != 0) {
+ tmp_oids = list_copy_tail(table_oids, rand);
+ table_oids = list_copy_head(table_oids,
list_length(table_oids) - rand);
+ table_oids = list_concat(table_oids, tmp_oids);
+ }
+ }
/*
* Perform operations on collected tables.
*/

[1] How Autovacuum Goes Wrong: And Can We Please Make It Stop Doing
That? https://www.youtube.com/watch?v=RfTD-Twpvac

--
Regards
Junwang Zhao

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2025-04-25 14:04:29 Re: Making sslrootcert=system work on Windows psql
Previous Message David Steele 2025-04-25 13:50:16 Re: Improve verification of recovery_target_timeline GUC.