Re: Interpreting vacuum verbosity

From: "Ed L(dot)" <pgsql(at)bluepolka(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Interpreting vacuum verbosity
Date: 2004-05-07 05:27:10
Message-ID: 200405062327.10963.pgsql@bluepolka.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thursday May 6 2004 10:30, Tom Lane wrote:
> "Ed L." <pgsql(at)bluepolka(dot)net> writes:
> > If I see VACUUM ANALYZE VERBOSE output like this...
> >
> > INFO: --Relation public.foo--
> > INFO: Index idx_foo_bar: Pages 219213; Tuples 28007: Deleted 9434.
> > CPU 17.05s/4.58u sec elapsed 3227.62 sec.
> >
> > ...am I correct in reading this to say that it took more than 53
> > minutes (3227 secs) to get 17 seconds of CPU time? Is this an
> > indicator of possible I/O contention?
>
> More like "your disk drives are being pounded into the ground" ?
>
> It's hard to evaluate this without knowing what else is going on in your
> system at the same time. In general a pure VACUUM process *ought* to be
> I/O bound. But without any additional data it's hard to say if 200:1
> CPU vs I/O ratio is reasonable or not. Were other things happening at
> the same time, and if so did they seem bogged down? What sort of
> hardware is this on anyway?

There was a ton of other activity; tens to hundreds of inserts and deletes
occurring per second. Lots of bogged down, ridiculously slow queries:
30-second selects on a 500-row table immediately after ANALYZE finished on
the table, absurdly long inserts, etc. This is a SmartArray 5i/32 RAID5
device with some sort of Dell RAID controller, I believe, 160mb/s, dual
3.2GHz xeons, plenty of RAM.

Some s/w redesign cut the I/O very signficantly, but it was still
ridiculously slow. After seeing the VACUUM ANALYZE VERBOSE output for the
most troublesomely slow table, and noticing 2.5M unused tuples there, we
decided to drop/recreate/reload that table to reclaim the space and on the
hunch that it might be related. We did that in a transaction without any
customer downtime, and upon reloading, the system was blazing fast again.
Joy. That was cool.

I guess the activity just totally outran the ability of autovac to keep up.
I was under the impression that unused tuples were only a diskspace issue
and not such a performance issue, but maybe the live data just got so
fragmented that it took forever to perform small scans over so many pages?

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Bastian 2004-05-07 07:54:14 Re: Verhindern, dass im Mehrbenutzerbetrieb mit veralteten Daten gearbteitet wird
Previous Message Tom Lane 2004-05-07 04:30:28 Re: Interpreting vacuum verbosity