Re: Vacuum daemon (pgvacuumd ?)

From: "Nicolas Bazin" <nbazin(at)ingenico(dot)com(dot)au>
To: "mlw" <markw(at)mohawksoft(dot)com>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Vacuum daemon (pgvacuumd ?)
Date: 2002-03-06 03:14:02
Message-ID: 001101c1c4bc$f7667fb0$660d090a@software.ingenico.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


----- Original Message -----
From: "mlw" <markw(at)mohawksoft(dot)com>
To: "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Sent: Wednesday, March 06, 2002 1:21 PM
Subject: [HACKERS] Vacuum daemon (pgvacuumd ?)

> (for background, see conversation: "Postgresql backend to perform vacuum
> automatically" )
>
> In the idea phase 1, brainstorm
>
> Create a table for the defaults in template1
> Create a table in each database for state inforation.
>
> Should have a maximum duty cycle for vacuum vs non-vacuum on a per table
basis.
> If a vacuum takes 3 minutes, and a duty cycle is no more than 10%, the
next
> vacuum can not take place for another 30 minutes. Is this a table or
database
> setting? I am thinking table. Anyone have good arguments for database?
>
> Must have a trigger point of number of total tuples vs number of dirty
tuples.
> Unfortunately some tuples are more important than others, but that I don't
know
> how to really detect that. We should be able to keep track of the number
of
> dirty tuples in a table. Is it known how many tuples are in a table at any
> point? (if so, on a side note, can we use this for a count()?) How about
dirty
> tuples?
This parameters are certainly correct in a lot of cases, but why not use a
stored proc to decide when to start a vacuum. The system table can maintain
raw data related to vacuum: last vacuum timestamp, previous vacuum duration,
table priority, .... Then a parameter can be used let say from 0 to 9 (0 for
no vacuum) as a vacuum profile. The stored proc. would translate this
profile to thresholds adapted to its algorithm that can use the per-table
statistic that already exist.
Obviously a standard proc can be installed but it lets the DBA the
possibility to adapt the criteria to its DB whith no modification to the
code.

>
> Is the number of deleted tuples sufficient to decide priority on vacuum?
My
> thinking is that the tables with the most deleted tuples is the table
which
> need most vacuum. Should ratio of deleted tuples vs total tuples or just
count
> of deleted tuples. I am thinking ratio, but maybe it need be tunable.
>
>
> Here is the program flow:
>
> (1) Startup (Do this for each database.)
> (2) Get all the information from a vacuumd table.
> (2) If the table does not exist, perform a vacuum on all tables, and
initialize
> the table to current state.
> (3) Check which tables can be vacuumed based on their duty cycle and
current
> time.
> (4) If the tables eligible to be vacuumed have deleted tuples which exceed
> acceptable limits, vacuum them.
> (5) Wait a predefined time, loop (2)
>
> This is my basic idea, what do you all think?
>
> I plan to work on this in the next couple weeks. Any suggestions, notes,
> concerns, features would be welcome.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2002-03-06 03:15:09 Re: Do we still have locking problems with concurrent users
Previous Message Neil Conway 2002-03-06 03:13:25 Re: Do we still have locking problems with concurrent