Re: Heavily fragmented table and index data in 8.0.3

From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Zoltan Boszormenyi" <zb(at)cybertec(dot)at>
Cc: "pgsql general" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Heavily fragmented table and index data in 8.0.3
Date: 2008-06-06 20:15:10
Message-ID: 87lk1icoc1.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


"Zoltan Boszormenyi" <zb(at)cybertec(dot)at> writes:

> Also, VACUUM FULL also takes too much time, on an otherwise idle database, I
> worked on a copy of their live database. During VACUUM, _bt_getbuf() was
> also called repeatedly with the block number jumping up and down.

VACUUM or VACUUM FULL? VACUUM should only read the table sequentially but
VACUUM FULL behaves exactly as you describe which is one of the reasons it
sucks so much.

That said I think 8.0's VACUUM does jump around when cleaning indexes. That's
a big reason to upgrade to a more modern version. More recent VACUUM's (but
not VACUUM FULL) do only sequential scans of both the table and indexes.

VACUUM FULL also tends to bloat indexes. If you've been running VACUUM FULL
regularly on this table you may want to REINDEX this table.

> I know, 8.0.3 is quite old. But nothing jumped out from the changelog
> up to 8.0.15 that would explain this excessive slowness. SELECTs are
> pretty fast on any of the tables I tried, but INSERT hangs on this table.
> How does this fragmentation happen and how can we prevent this situation?

I'm not sure "fragmentation" has a direct analogy since tuples lie entirely on
one page. Unless perhaps you have a problem with TOAST data being laid out
poorly. Are any individual rows in tables over 2k?

The problems people do run into are either

a) lots of dead space because either vacuum (plain old vacuum, not full)
wasn't run regularly or because large batch updates or deletes were run which
later activity could never reuse

b) indexes with bloat either due to the above or due to deleting many but not
all tuples from a range and then never inserting into that range again.
indexes can only reuse tuples if you insert in the same page again or if you
delete all the tuples on the page.

One trick you could use if you can stand the downtime is to periodically
CLUSTER the table. Older versions of Postgres had a concurrency bugs in
CLUSTER to watch out for, but as long as you don't run it at the same time as
a very long-running transaction such as pg_dump it shouldn't be a problem.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's 24x7 Postgres support!

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Craig Ringer 2008-06-06 20:23:19 Re: when to reindex?
Previous Message edfialk 2008-06-06 19:39:08 hopefully a brain teaser, can't quite figure out query