Re: Bloated tables and why is vacuum full the only option

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc: Tomas Vondra <tv(at)fuzzy(dot)cz>, postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Bloated tables and why is vacuum full the only option
Date: 2014-02-09 19:40:00
Message-ID: 19897.1391974800@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Claudio Freire <klaussfreire(at)gmail(dot)com> writes:
>>> I also do routine reindexing to stop index bloat on its tracks, yet
>>> freshly-reindexed indexes get considerably reduced in size with vacuum
>>> full.

AFAIK there's no reason for vacuum full to produce a different result
from reindex. Did you mean to say that the indexes get smaller than
what they had been after some normal operation? If so it's worth noting
this comment from the btree index building code (nbtsort.c):

* It is not wise to pack the pages entirely full, since then *any*
* insertion would cause a split (and not only of the leaf page; the need
* for a split would cascade right up the tree). The steady-state load
* factor for btrees is usually estimated at 70%. We choose to pack leaf
* pages to the user-controllable fill factor (default 90%) while upper pages
* are always packed to 70%. This gives us reasonable density (there aren't
* many upper pages if the keys are reasonable-size) without risking a lot of
* cascading splits during early insertions.

As the comment notes, the initial state of a freshly-built index is packed
more densely than what you can expect after a lot of insertions/updates
have occurred. That's not a bug, it's just a fact of life.

Also, there are certain usage patterns that can result in btree indexes
having densities much lower than the conventional-wisdom 70%. The main
one I've seen in practice is "decimation", where you delete say 99 out
of every 100 entries in index order. This leaves just a few live entries
in each leaf page --- but our btree code doesn't reclaim an index page
for recycling until it's totally empty. So you can end up with a very
low load factor after doing something like that, and a reindex is the
only good way to fix it.

regards, tom lane

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Claudio Freire 2014-02-09 19:48:18 Re: Bloated tables and why is vacuum full the only option
Previous Message Claudio Freire 2014-02-09 19:13:53 Re: Bloated tables and why is vacuum full the only option