Re: Enabling Autovacuum Postgres 9.1 (was Unexpectedly high disk space usage)

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Lists <lists(at)benjamindsmith(dot)com>
Cc: Adrian Klaver <adrian(dot)klaver(at)gmail(dot)com>, Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, pgsql <pgsql-general(at)postgresql(dot)org>
Subject: Re: Enabling Autovacuum Postgres 9.1 (was Unexpectedly high disk space usage)
Date: 2012-11-13 18:31:24
Message-ID: CAMkU=1ymL8mZY0MygGkAaVC9py_ou13p2F+EYSkbTmXqmUzzkw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, Nov 12, 2012 at 12:04 PM, Lists <lists(at)benjamindsmith(dot)com> wrote:
>
>
> Should I increase the max_workers field from the default of 3 to (perhaps)
> 10?

I would not. You report that the reason you turned off autovac is
because it made your database crawl when it kicked in. That suggests
that if anything you should reduce that parameter (assuming you are
still doing manual vacuums at off-peak hours, leaving autovacuum to
only mop up what is left).

>
> Why would I want to reduce the cost delay to 0, and how does this relate to
> cost_limit? Careful reading of the docs:
> http://www.postgresql.org/docs/9.1/static/runtime-config-resource.html makes
> me believe that, given my substantial I/O subsystem, I'd want to drop
> cost_delay to near zero and set the cost_limit really high, which is a rough
> restatement of the last quoted paragraph above. (I think)

Given that autovac kicking in destroys your performance, I think that
your I/O subsystem may not be all that you think it is. Do you have
test/dev/QA system with the same subsystem that you can use for
investigation? If so, do you have a vaguely realistic load generator
to drive those systems?

> Assuming that I make these suggestions and notice a subsequent system load
> problem, what information should I be gathering in order to provide better
> post-incident forensics?

If you are going to be focusing your undivided attention on monitoring
the system during the period, just keeping a window open with "top"
running is invaluable. (On most implementations, if you hit 'c' it
will toggle the command display so you can see the results of
"update_process_title=on")

Also, "sar" is useful, and on most systems has the advantage that its
stats are always being gathered without you having to do anything, so
it works well for unexpected problems arising. I often just have
"vmstat 1 -t" running in the background streaming into a log file, for
the same reason.

For internal to pgsql, set log_min_duration_statement to a value which
few statements will exceed under normal operations, but many will when
things bog down. That way you can figure out exactly when things
bogged down after unattended operation, to correlate it with the
sar/vmstat/etc reports.

I'd also set for the probationary period (if you haven't already):

log_lock_waits = on
log_checkpoints = on
log_autovacuum_min_duration = 0 (or some smallish positive value)

The last one only logs when it finishes vacuuming a table. I wish
there was a way to make it log when it started as well, but I don't
think there is.

Cheers,

Jeff

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2012-11-13 19:51:15 Re: Running out of memory while making a join
Previous Message Clemens Park 2012-11-13 16:51:50 Using window functions to get the unpaginated count for paginated queries