Re: db size and VACUUM ANALYZE

From: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Amitabh Kant <amitabhkant(at)gmail(dot)com>, Marcin Krol <mrkafk(at)gmail(dot)com>, PGSQL Mailing List <pgsql-general(at)postgresql(dot)org>
Subject: Re: db size and VACUUM ANALYZE
Date: 2010-02-13 08:38:50
Message-ID: dcc563d11002130038t39543fb4p3ed8a5570b2a37e8@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Sat, Feb 13, 2010 at 12:19 AM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> Amitabh Kant wrote:
>>
>> You need to do VACUUM FULL ANALYZE to claim the disk space, but this
>> creates a exclusive lock on the tables.
>> See http://www.postgresql.org/docs/8.3/static/sql-vacuum.html
>
> First off, you don't need the ANALYZE in there.
>
> Second, VACUUM FULL is a terrible way to fix a table that's seriously
> screwed up--it will take forever to run.  Use CLUSTER to accomplish the same
> thing much faster; it basically does the same thing as the dump/restore step
> that's restoring good performance to the database.

This is a bit of an oversimplification. I've found that selecting the
contents of the table out, truncating the table, and inserting them
back in from a select with an order by can be orders of magnitude
faster than cluster IF the data in the table is basically random.
After that, cluster can perform reasonably well to keep the table
clustered, because it's mostly in order already. Basically, unless
it's been fixed in 9.0, cluster reads the table by index entry one row
at a time and builds the new table. This is very very slow for a
randomly ordered table.

> Before doing that, I would run a VACUUM VERBOSE on the whole cluster and see
> if there are any max_fsm_pages warnings in there.  Those settings might be
> too low, for example if large deletions are done in batches, and ultimately
> be the true cause of this problem.

Good point, if he's blowing out the fsm regularly then the fix above
will be temporary at best.

Since setting fsm pages / relations is basically very cheap, it's a
good idea to set them a few times higher than what you need, so if you
need 1M set it to 10M to give a big buffer in case things get worse
over time. Especially since fsm pages is a restart requiring change.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Greg Smith 2010-02-13 09:12:39 Re: COPY FROM wish list
Previous Message Greg Smith 2010-02-13 07:41:11 Re: Memory Usage and OpenBSD