Re: finding tables about to be vacuum freezed

From: Steve Kehlet <steve(dot)kehlet(at)gmail(dot)com>
To: Forums postgresql <pgsql-general(at)postgresql(dot)org>
Subject: Re: finding tables about to be vacuum freezed
Date: 2015-05-12 16:51:04
Message-ID: CA+bfosFLP+MPz1JcaRym0BMfzNywpF3TKMuo+2+5cRTskyBbEw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, May 6, 2015 at 7:24 PM Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:

> I've booked-marked these but haven't really looked into them to any
> extent. It would be awesome if you put the SQL one somewhere on
> http://wiki.postgresql.org. That way it is easier to find, and anyone
> who finds it can contribute explanations, corrections, and update it to
> keep up with changes to the database.
>

I'll see if I can do this!

> I don't have any experience with 3.5TB databases, but I certainly think
> that that is something to worry about.
>

We did hit the autovacuum_freeze_max_age threshold on Saturday and the
autovacuumer has been running for days now, slowly cranking through each
table. Fortunately, I had autovacuum_vacuum_cost_delay set so the IO impact
isn't bad: no complaints from the customer, and our ops group says the IO
load is okay. So Postgres is just quietly doing its thing. This has clearly
happened numerous times before, and explains a few mysterious incidents in
the past where a nightly analyze script has hung for several days. It's
really great to understand this better now.

> There are two main problems you are likely to encounter (from what I
> know):
>
> One is that the autovacuum scheduler deals poorly with a database
> exceeding autovacuum_freeze_max_age. It forces all available to autovacuum
> resources to be directed to that database, starving any other database of
> attention. If you have multiple active databases, by the time one database
> has been frozen enough to no longer exceed autovacuum_freeze_max_age, the
> other one(s) might be horribly bloated. If your cluster only has one
> active database in it, this won't be a problem. The one that gets all the
> attention is the one that needs all the attention. But if you have
> multiple active databases in your cluster, this could be a problem.
>

Fortunately in this case it's just one database, but good to know.

> The other problem is that autovac takes a fairly strong lock out on the
> table while it is vacuuming it. Normally it relinquishes the lock once it
> realizes someone else is waiting on it. But in the case of a forced
> full-table scan (either autovacuum_freeze_max_age or
> vacuum_freeze_table_age is exceeded), it refuses to relinquish the lock.
> This means that any process which needs a strong-ish table lock (add
> column, drop column, create index, drop index, cluster, truncate, reindex,
> etc.) is going to block for potentially a very very long time. This is
> only a problem if you actually have such processes. If all you do is
> select, insert, update, delete, none of those things will be blocked.
>

You're right, that was the exact problem that got me going down this path.
Some of our guys were doing a software update and some CREATE INDEX
operations it wanted to do were blocked by the autovacuumer. Fortunately,
we don't do software updates all the time, but it's falling onto me to
figure out how to make the autovacuumer not do its thing in the middle of
future software updates :-). I might do ALTER TABLES on all tables to
temporarily increase their autovacuum_freeze_max_age before, and undo it
after. Kind of hacky, but it should work.

> So if you have lull time at night, it would be a good idea to preemptively
> vacuum tables approaching autovacuum_freeze_max_age (and also exceeding
> vacuum_freeze_table_age). I wouldn't even do VACUUM FREEZE, just VACUUM.
>

I'll investigate this. I found Josh Berkus' excellent articles, [Freezing
your tuples off](
http://www.databasesoup.com/2012/09/freezing-your-tuples-off-part-1.html)
and the nightly [Flexible Freeze](
https://github.com/pgexperts/flexible-freeze) program that came out that.
I'll investigate using this so we can better control when this happens.
Although, given that the IO load even during production hours isn't making
anyone scream, I might just leave it alone.

But aware that, be default setting, autovac is highly throttled for IO,
> while regular vacuum is entirely unthrottled for IO. So if nighttime is not
> completely idle but only relatively less busy of user activity, you might
> want to evaluate what level of throttling is appropriate.
>

Thanks Jeff for all your help and insight.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message William Dunn 2015-05-12 19:02:36 Re: finding tables about to be vacuum freezed
Previous Message Wayne E. Seguin 2015-05-12 14:21:09 Re: [BDR] Node Join Question