Re: scheduling autovacuum at lean hours only.

From: Chris Browne <cbbrowne(at)acm(dot)org>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: scheduling autovacuum at lean hours only.
Date: 2009-02-13 17:37:21
Message-ID: 87wsbu8cha.fsf@dba2.int.libertyrms.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

mallah(dot)rajesh(at)gmail(dot)com (Rajesh Kumar Mallah) writes:
> why is it not a good idea to give end users control over when they
> want to run it ?

It's not a particularly good idea to give end users things that they
are likely then to *immediately* use to shoot themselves in the foot.

Turning off vacuuming "all day" is the sort of thing that is indeed
pretty certain to hurt you when you imagined it was going to help you.

In particular, if you shut off autovac all day, heavily updated tables
with certain sorts of (pretty common!) update patterns are certain to
"bloat up," to the point that you'll need to do CLUSTER/VACUUM FULL on
them.

In effect, the practical effect of "autovacuum at lean hours only" is
more reasonably described as "cancel autovacuum and revert to the
elder policy of requiring users to do manual vacuuming."

It's worth looking at how autovacuum has been evolving over time...

- When it was introduced, 8.0-ish (maybe it was 8.1 when it became
"official"), it was pretty primitive.

Autovac was a single process, where you had three controls over
behaviour:

- You could run autovac, or not;

- You could exclude specific tables from being processed by autovac

- There is a capability to vacuum less aggressively by using
delays to reduce autovac I/O usage

- In 8.3, it was enhanced to add the notion of having multiple vacuum
workers

There was discussion about having one of those workers restrict
itself to small tables, so that you'd never have the scenario where
the workers were all busy and a small table that needed vacuuming
was left unvacuumed for a long time. It appears that didn't happen,
which seems unfortunate, but that's life...

You should look at all the "knobs" that *are* offered before deciding
a policy that may be harmful to performance. As things stand now,
there are a couple of ways I could see tuning this:

- You might check on the GUC variables autovacuum_vacuum_cost_delay
and autovacuum_vacuum_cost_limit, which would allow you to restrict
the I/O cost.

This might allow you to run autovacuum all the time without
adversely affecting performance.

- You might come up with a list of the *LARGE* tables that you don't
want vacuumed during the day, and set up a cron job that adds/drops
them from the pg_autovacuum table at the appropriate times.

This is definitely going to be more finicky, and requires a great
deal more awareness of the tables being updated by your
applications. It makes "autovacuum" a whole lot less "automatic."

There are known enhancements coming up:

- In 8.4, there is a capability for VACUUM to only process the
portions of the table known to have been altered.

That ought to be a better answer than *any* of the fiddling
suggested, to date. Right now, a VACUUM on "public.my_huge_table",
a table 18GB in size, will walk through the entire table, even
though there were only a handful of pages where tuples were
invalidated.

This is almost certainly the single best improvement possible to
resolve your issue; it seems likely to *directly* address the
problem, and has the considerable merit of not requiring much if
any configuration/reconfiguration/scheduling.
--
(reverse (concatenate 'string "gro.mca" "@" "enworbbc"))
http://linuxdatabases.info/info/
"what would we do without C? we would have PASAL, BASI, OBOL, and
Ommon Lisp." -- #Erik

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Craig Ringer 2009-02-14 08:46:07 Re: dissimilar drives in Raid10 , does it make difference ?
Previous Message Kevin Grittner 2009-02-13 17:35:29 Re: I/O increase after upgrading to 8.3.5