Re: serious problems with vacuuming databases

From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To:
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: serious problems with vacuuming databases
Date: 2006-04-09 20:33:58
Message-ID: 44396FB6.6000900@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Tom Lane wrote:
> Tomas Vondra <tv(at)fuzzy(dot)cz> writes:
>> 1) drop, create and fill table B (aggregated data from A, C, D)
>> 2) copy 'old' data from A, C and D to A_old, C_old a D_old
>> 3) delete old data from A, C, D
>> 4) dump data from A_old, C_old and D_old
>> 5) truncate tables A, C, D
>> 6) vacuum full analyze tables A, C, D, A_old, C_old and D_old
>
> Steps 3/5/6 make no sense at all to me: why bother deleting data retail
> when you are about to truncate the tables, and why bother vacuuming a
> table you just truncated? Is the above *really* what you did?

Yes, the above is exactly what I did with the exception that there's an
error in the step (5) - there should be truncation of the _old tables.
The reasons that led me to this particular steps are two:

(a) I don't want to delete all the data, just data older than two days.
Until today we've kept all the data (containing two years access log
for one of our production websites), but now we've decided to remove
the data we don't need and leave just the aggregated version. That's
why I have used DELETE rather than TRUNCATE.

(b) I want to create 'incremental' backups, so once I'll need the data
I can take several packages (dumps of _old tables) and import them
one after another. Using pg_dump doesn't allow me this - dumping the
whole tables A, C and D is not an option, because I want to leave
some of the data in the tables.

From now on, the tables will be cleared on a daily (or maybe weekly)
basis, which means much smaller amount of data (about 50.000 rows
a day).
>
>> The problem is this - today, we run a scheduled VACUUM FULL ANALYZE for
>> the whole database, and it runs for about 10 hours already, which is
>> much more than usual (and it is still running).
>
> Is it actually grinding the disk, or is it just blocked waiting for
> someone's lock? If it's actually doing work, which table is it working
> on? (You should be able to figure that out by looking in pg_locks,
> or by strace'ing the process to see which files it's touching.)

Thanks for the hint, I'll try to figure that in case the dump/reload
recommended by Alvaro Herrera doesn't help. But as far as I know the
disks are not grinded right now, so I guess it's the problem with indexes.

t.v.

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Tomas Vondra 2006-04-09 20:37:05 Re: serious problems with vacuuming databases
Previous Message Bruce Momjian 2006-04-09 20:24:44 Re: Scaling up PostgreSQL in Multiple CPU / Dual Core