serious problems with vacuuming databases

From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-performance(at)postgresql(dot)org
Subject: serious problems with vacuuming databases
Date: 2006-04-09 18:22:51
Message-ID: 443950FB.4050707@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello,

we have some performance problems with postgres 8.0.4, more precisely
with vacuuming 'large' database with a lot of deleted rows.

We had a 3.2 GB database, consisting mainly from 4 large tables, two of
them (say table A and B) having about 14.000.000 of rows and 1 GB of
size each, and two (say C and D) having about 4.000.000 of rows and 500
MB each. The rest of the database is not important.

We've decided to remove unneeded 'old' data, which means removing about
99.999% of rows from tables A, C and D (about 2 GB of data). At the
beginning, the B table (containing aggregated from A, C and D) was
emptied (dropped and created) and filled in with current data. Then,
before the deletion the data from tables A, C, D were backed up using
another tables (say A_old, C_old, D_old) filled in using

INSERT INTO A SELECT * FROM A_old ...

and fixed so there are no duplicities (rows both in A and A_old). Then
these data were deleted from A, C, D and tables A_old, C_old and D_old
were dumped, truncated and all the tables were vacuumed (with FULL
ANALYZE options). So the procedure was this

1) drop, create and fill table B (aggregated data from A, C, D)
2) copy 'old' data from A, C and D to A_old, C_old a D_old
3) delete old data from A, C, D
4) dump data from A_old, C_old and D_old
5) truncate tables A, C, D
6) vacuum full analyze tables A, C, D, A_old, C_old and D_old

So the dump of the fatabase has about 1.2 GB of data, from which about
1 GB is in the B table (the one rebuilt in step 1). This was done yesterday.

The problem is this - today, we run a scheduled VACUUM FULL ANALYZE for
the whole database, and it runs for about 10 hours already, which is
much more than usual (and it is still running).

The hardware is not too bad - it's Dell server with 2 x 3.0 GHz P4 HT,
4GB of RAM, 2x15k SCSI drives in hw RAID etc.

The question is why this happens and how to get round that. I guess it's
caused by a huge amount of data deleted yesterday, but on the other side
all the modified tables were vacuumed at the end. But I guess dropping
and reloading the whole database would be much faster (at most 1.5 hour
including creating indexes etc.)

thanks for your advices
Tomas

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Alvaro Herrera 2006-04-09 18:37:47 Re: serious problems with vacuuming databases
Previous Message Rajesh Kumar Mallah 2006-04-09 15:34:43 Re: pls reply ASAP