Re: pg_autovacuum next steps

From: Joe Conway <mail(at)joeconway(dot)com>
To: "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_autovacuum next steps
Date: 2004-03-22 23:27:45
Message-ID: 405F7671.7070705@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Matthew T. O'Connor wrote:
> * Inability to customize thresholds on a per table basis

I ran headlong into this one. IMHO fixing this is critical.

> * Inability to set default thresholds on a per database basis
> * Inability to exclude specific databases / tables from pg_autovacuum
> monitoring

These would be nice to have, but less critical than #1 I think.

> * Inability to schedule vacuums during off-peak times

This would be *really* nice to have. In my recent case, if pg_autovacuum
could work for say 3 minutes, and then back off for 2 minutes or so
while the batch transactions hit, it would be ideal.

> I'm not sure how to address all of these concerns, or that they all
> should be addressed right now. One of my big questions is backend
> integration. I am leaning towards leaving pg_autovacuum as a client
> application in contrib for one more release. During this time, I can
> continue to tweak and improve pg_autovacuum so that we will have a very
> good idea what the final product should be before we make it a standard
> backend process.

I really think pg_autovacuum ought to get folded into the backend now,
for 7.5. I haven't had time yet to read the entire thread, but I saw
others making the same comment. It would make some of the listed
problems go away, or at least become far easier to deal with.

> For PostgreSQL 7.5, I plan to implement these new features:
>
> 1.Per database defaults and per table thresholds (including total
> exclusion)

Great!

> 2.Persistent data
> 3.Single-Pass Mode (external scheduling from cron etc...)
> 4.Off peak scheduling

Great again!

> 1. Per Database defaults and Per table Thresholds:
>
> 1.Store config data inside a special pg_autovacuum table inside
> existing databases that wants custom settings.

A natural if folded into the backend.

> 3.Single-Pass Mode (External Scheduling):
>
> I have received requests to be able to run pg_autovacuum only on request
> (not as a daemon) making only one pass over all the tables (not looping
> indefinately). The advantage being that it will operate more like the
> current vacuum command except that it will only vacuum tables that need
> to be vacuumed. This feature could be useful as long as pg_autovacuum
> exists outside the backend. If pg_autovacuum gets integrated into the
> backend and gets automatically started as a daemon during startup, then
> this option will no longer make sense.

It still might make sense. You could have a mode where the daemon
essentially sleeps forever, until explicitly woken up by a signal. When
woken, it makes one pass, and goes back to infinite sleep. Then provide
a simple way to signal the autovacuum process -- maybe an extension of
the current VACUUM syntax.

> 4.Off-Peak Scheduling:
>
> A fundamental advantage of our vacuum system is that the work required
> to reclaim table space is taken out of the critical path and can be
> moved to and off-peak time when cycles are less precious. One of the
> drawbacks of the current pg_autovacuum is that it doesn't have any way
> to factor this in.
>
> In it's simplest form (which I will implement first) I would add the
> ability to add a second set of thresholds that will be active only
> during an “off-peak” time that can be specified in the pg_autovacuum
> database, perhaps in a general_settings table.

I don't know how this would work, but it is for sure important. In the
recent testing I found that pg_autovacuum (well, lazy vacuum in general,
but I was using pg_autovacuum to control it) made a huge difference in
performance of batch transactions. They range from 4-5 seconds without
vacuum running, to as high as 15 minutes with vacuum running. With the
vacuum delay patch, delay = 1, pagecount = 8, I still saw times go as
high as 10 minutes. Backing vacuum off any more than that caused it to
fall behind the transaction rate unrecoverably. But as I said above, if
the transactions could complete without vacuum running in 4-5 seconds,
then vacuuming resumes for the 3-to-4 minutes between batches, all would
be well.

Joe

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Matthew T. O'Connor 2004-03-22 23:46:14 Re: pg_autovacuum next steps
Previous Message Bernd Helmle 2004-03-22 23:15:35 Re: Thoughts about updateable views