Re: autovacuum not prioritising for-wraparound tables

From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: autovacuum not prioritising for-wraparound tables
Date: 2013-02-03 17:25:37
Message-ID: 1359912337.89604.YahooMailNeo@web162905.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2013-02-01 15:09:34 -0800, Jeff Janes wrote:

>> Since freeze_min_age was mistakenly being used, the limit
>> would be 50 million in the past (rather than 150 million) under
>> defaults.  But since the last full-table vacuum, whenever that was,
>> used freeze_min_age for its intended purpose, that means the 50
>> million in the past *at the time of that last vacuum* is the highest
>> that relfrozenxid can be.  And that is going to be further back than
>> 50 million from right now, so the vacuum will always be promoted to a
>> full scan.

> This is rather scary. How come nobody noticed that this major
> performance improvement was effectively disabled for that long?
>
> I wonder if Kevin's observations about the price of autovac during
> OLTPish workloads isn't at least partially caused by this. It will cause
> lots of io prematurely because it scans far more than it should and a
> VACUUM FREEZE will push it off.

Well, there seem to be multiple problems with autovacuum which are
serious enough that people are noticing severe hits to production
performance, figuring out that it is autovacuum, turning off
autovacuum and getting immediate relief, and then calling for
support a few months later when the inevitable consequences hit.
At that point, of course, there is a lot of pain, and it is hard to
recover from it without a sizable maintenance window.

I was able to confirm two cases where this was a consequence of the
lazy truncate logic which Jan recently fixed, but there are clearly
other problems which I didn't have much of a grasp on prior to this
thread.  The only thing I knew for sure was that a bulk load, and
in particular a pg_dump/restore cycle, was a time bomb without
VACUUM FREEZE, and users are often reluctant to add time for that
to their maintenance window.  If we can't load tuples as frozen in
the first plance, the next best thing would be to nibble away at
freezing them in the background, so that there is no single big
hit.  While proposals here seem to deal with very real problems
which we should fix, I'm not sure that anything here addresses this
issue.  COPY FREEZE does address this, where it can be used; but
I'm not sure that we don't need to address it in the autovacuum end
of things, too.

These problems are serious enough to merit cautious back-patching,
in my view; the current state of affairs really is causing serious
disruption of production environments.

-Kevin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2013-02-03 17:53:55 Re: [COMMITTERS] pgsql: Mark vacuum_defer_cleanup_age as PGC_POSTMASTER.
Previous Message Andres Freund 2013-02-03 16:41:20 Re: autovacuum not prioritising for-wraparound tables