From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Joe Conway <mail(at)joeconway(dot)com>, Michael Banck <mbanck(at)gmx(dot)net>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Frédéric Yhuel <frederic(dot)yhuel(at)dalibo(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: New GUC autovacuum_max_threshold ? |
Date: | 2024-05-02 02:02:46 |
Message-ID: | CAApHDvo8DWyt4CWhF=NPeRstz_78SteEuuNDfYO7cjp=7YTK4g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, 27 Apr 2024 at 02:13, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> Let's compare the current situation to the situation post-patch with a
> cap of 500k. Consider a table 1024 times larger than the one I
> mentioned above, so pgbench scale factor 25600, size on disk 320GB.
> Currently, that table will be vacuumed for bloat when the number of
> dead tuples exceeds 20% of the table size, because that's the default
> value of autovacuum_vacuum_scale_factor. The table has 2.56 billion
> tuples, so that means that we're going to vacuum it when there are
> more than 510 million dead tuples. Post-patch, we will vacuum when we
> have 500 thousand dead tuples. Suppose a uniform workload that slowly
> updates rows in the table. If we were previously autovacuuming the
> table once per day (1440 minutes) we're now going to try to vacuum it
> almost every minute (1440 minutes / 1024 = 84 seconds).
I've not checked your maths, but if that's true, that's not going to work.
I think there are fundamental problems with the parameters that drive
autovacuum that need to be addressed before we can consider a patch
like this one.
Here are some of the problems that I know about:
1. Autovacuum has exactly zero forward vision and operates reactively
rather than proactively. This "blind operating" causes tables to
either not need vacuumed or suddenly need vacuumed without any
consideration of how busy autovacuum is at that current moment.
2. There is no prioritisation for the order in which tables are autovacuumed.
3. With the default scale factor, the larger a table becomes, the more
infrequent the autovacuums.
4. Autovacuum is more likely to trigger when the system is busy
because more transaction IDs are being consumed and there is more DML
occurring. This results in autovacuum having less work to do during
quiet periods when there are more free resources to be doing the
vacuum work.
In my opinion, the main problem with Frédéric's proposed GUC/reloption
is that it increases the workload that autovacuum is responsible for
and, because of #2, it becomes more likely that autovacuum works on
some table that isn't the highest priority table to work on which can
result in autovacuum starvation of tables that are more important to
vacuum now.
I think we need to do a larger overhaul of autovacuum to improve
points 1-4 above. I also think that there's some work coming up that
might force us into this sooner than we think. As far as I understand
it, AIO will break vacuum_cost_page_miss because everything (providing
IO keeps up) will become vacuum_cost_page_hit. Maybe that's not that
important as that costing is quite terrible anyway.
Here's a sketch of an idea that's been in my head for a while:
Round 1:
1a) Give autovacuum forward vision (#1 above) and instead of vacuuming
a table when it (atomically) crosses some threshold, use the existing
scale_factors and autovacuum_freeze_max_age to give each table an
autovacuum "score", which could be a number from 0-100, where 0 means
do nothing and 100 means nuclear meltdown. Let's say a table gets 10
points for the dead tuples meeting the current scale_factor and maybe
an additional point for each 10% of proportion the size of the table
is according to the size of the database (gives some weight to space
recovery for larger tables). For relfrozenxid, make the score the
maximum of dead tuple score vs the percentage of the age(relfrozenxid)
is to 2 billion. Use a similar maximum score calc for age(relminmxid)
2 billion.
1b) Add a new GUC that defines the minimum score a table must reach
before autovacuum will consider it.
1c) Change autovacuum to vacuum the tables with the highest scores first.
Round 2:
2a) Have autovacuum monitor the score of the highest scoring table
over time with buckets for each power of 2 seconds in history from
now. Let's say 20 buckets, about 12 days of history. Migrate scores
into older buckets to track the score over time.
2b) Have autovacuum cost limits adapt according to the history so that
if the maximum score of any table is trending upwards, that autovacuum
speeds up until the score buckets trend downwards towards the present.
2c) Add another GUC to define the minimum score that autovacuum will
be "proactive". Must be less than the minimum score to consider
autovacuum (or at least, ignored unless it is.). This GUC would not
cause an autovacuum speedup due to 2b) as we'd only consider tables
which meet the GUC added in 1b) in the score history array in 2a).
This stops autovacuum running faster than autovacuum_cost_limit when
trying to be proactive.
While the above isn't well a well-baked idea. The exact way to
calculate the scores isn't well thought through, certainly. However, I
do think it's an idea that we should consider and improve upon. I
believe 2c) helps solve the problem of large tables becoming bloated
as autovacuum could get to these sooner when the workload is low
enough for it to run proactively.
I think we need at least 1a) before we can give autovacuum more work
to do, especially if we do something like multiply its workload by
1024x, per your comment above.
David
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2024-05-02 02:23:13 | Re: [PATCH] json_lex_string: don't overread on bad UTF8 |
Previous Message | Jacob Champion | 2024-05-01 23:22:24 | Re: [PATCH] json_lex_string: don't overread on bad UTF8 |