Re: index bloat

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "David Esposito" <pgsql-general(at)esposito(dot)newnetco(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: index bloat
Date: 2005-07-13 21:39:33
Message-ID: 21835.1121290773@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

"David Esposito" <pgsql-general(at)esposito(dot)newnetco(dot)com> writes:
> ... and the way new keys are
> inserted into the index is to always add them to a new page (where the 'new'
> page is either a truly new page, or a page that is completely empty), rather
> than using up some of the fragmented space within existing pages?

Well, they are added to a page associated with the key range they are
in. You can't just stuff keys into any randomly chosen index page;
otherwise the index wouldn't be readily searchable, which is more or
less the entire point of an index.

>> The testing I've been doing so far involves UPDATEs that touch a
>> uniformly distributed subset of the table --- maybe that's the aspect
>> that is failing to match your reality. Do you mostly update
>> recently-added rows? Can you quantify the effect at all?

> This is probably a good point ... The histogram of UPDATE activity to the
> table probably looks something like

> 90% - records created < 24 hours ago
> 7% - records created 1 - 2 days ago
> 2% - records created 2 - 7 days ago
> 1% - records older than 7 days

Ah; now I think things are starting to make sense. We had already
estimated that about 10% of the records are updated each day, but
what this says is that the majority of those updates happen to records
that have never yet seen a VACUUM --- which is to say the last 2%
(1/50th) of the table. So roughly speaking, the average record is
updated about five times in its first day of existence, and on
average less than once during its remaining 49 days of existence?

Given those sorts of numbers, what we've got is that by the time the
nightly VACUUM runs, the leading-edge part of the index (today's
entries) has been bloated to about 5x what its minimum size would be.
And what this means is that the steady-state situation will be that
sort of density throughout the whole index. The difficulty is that
the btree code will only reclaim entirely-empty index pages for reuse.
Given that an index on integer keys can fit about 500 keys per page,
even a 5x bloated index has still got about 100 keys per page, making
it fairly unlikely for any of the pages to go completely empty until
you purge that whole range of keys at the end of the record's life.

This is a situation where it'd be nice if VACUUM could merge adjacent
partly-full index pages so as to reclaim space before the pages go
entirely empty on their own. We looked at doing that when the btree
compaction code was first written, but set it aside as too hard because
of various concurrency problems. (I think this is actually the first
case I've seen reported from the field where that decision looks wrong.
You've got a pretty odd update distribution here --- not so much that
the skew to recent entries is surprising, as that the index keys are
all on non-updating fields and so there's no spreading out of the index
ranges affected by the updates.)

Barring someone stepping up and making page merging happen (don't hold
your breath), it seems you've basically got two alternatives:

1. VACUUM more often than once per day.

2. Live with a steady-state index size that's about 5x the minimum.

I'd recommend taking a hard look at choice #1 --- you could experiment
with using the "vacuum cost delay" settings to tamp down VACUUM's I/O
demand to the point where it doesn't kill interactive performance, and
then run it maybe every hour or two on the table(s) where you have this
problem.

(Memo to hackers: this is a fairly interesting case for autovacuum
I think. The overall update rate on the table is not high enough to
trigger frequent vacuums, unless autovacuum is somehow made aware that
particular index key ranges are getting hit more heavily than others.
Maybe this says that autovac needs to be tracking btree index page
splits, or some such statistic, more than just overall updates.)

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message snacktime 2005-07-13 21:42:25 Re: Transparent encryption in PostgreSQL?
Previous Message Jim C. Nasby 2005-07-13 21:11:12 Re: To Postgres or not