Re: tuning autovacuum

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: Bernd Helmle <mailings(at)oopsware(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Euler Taveira de Oliveira <euler(at)timbira(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: tuning autovacuum
Date: 2011-06-09 21:35:55
Message-ID: 4DF13CBB.3020302@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 06/09/2011 04:43 PM, Bernd Helmle wrote:
> I'd go further and expose the info or details issued by VACUUM VERBOSE
> into the view, too, at least the number of pages visited and cleaned
> (or dead but not yet cleaned). Customers are heavily interested in
> these numbers and i've found pgfouine to provide those numbers very
> useful.

Agreed there. The fact that VACUUM VERBOSE reports them suggests
they're not too terribly difficult to track either.

What we'd probably need to do with those is handle them like the other
stats in the system: store a total number for visited/cleaned/dead for
each relation, then increment the total as each vacuum finishes. That
way, you could point a standard monitoring system at it and see trends.
Just saving the last snapshot of data there isn't as useful.

I'm seeing these as being like the counters in pg_stat_bgwriter; while
it's easy to think of VACUUM "what work happened?" data as info you just
want the last snapshot of, a continuous incrementing counter can do that
and a lot of other things too. Anyone who is extracting useful data
from pg_stat_bgwriter can use the same logic to track this data, even if
it only moves forward in big chunks as vacuum completes. And it may be
feasible to update it in the middle, too.

Stepping into implementation for a second, the stats that are showing up
in pg_stat_user_tables are being driven by a PgStat_MsgVacuum message
coming out of the stats collector when it finishes. While that's the
obvious place to put some more stuff, that's not necessarily the right
way to build a better monitoring infrastructure. Two things to consider:

-It's not really aimed at being called multiple times for one operation
("needs vacuum", "started vacuum", "finished vacuum"
-There is a mix of things that make sense as long-term counters and
things that update as snapshots--the timestamps are the main thing there.

I haven't thought about it enough to have a real opinion on whether you
can squeeze everything into the existing message by adding more fields,
or if another type of message is necessary. Just pointing out that it's
not trivially obvious which approach is better.

What is unambiguous is that all this new data is really going to need a
new view for it, pg_stat_vacuum or something like that. The fields that
are already in pg_stat_user_tables can stay there as deprecated for a
while, but this all wants to be in its own new view.

This would really be a nice medium sized feature that DBAs would love,
and it would help adoption on big sites. I have some ideas on how to
get some funding to develop it because I keep running into this, but if
someone wants to run with the idea I'd be happy to just help instead.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2011-06-09 21:38:48 Re: On-the-fly index tuple deletion vs. hot_standby
Previous Message Tom Lane 2011-06-09 21:10:10 Re: Postmaster holding unlinked files for pg_largeobject table