Re: Vacuum daemon (pgvacuumd ?)

From: Lincoln Yeoh <lyeoh(at)pop(dot)jaring(dot)my>
To: mlw <markw(at)mohawksoft(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Vacuum daemon (pgvacuumd ?)
Date: 2002-03-06 05:14:56
Message-ID: 5.1.0.14.1.20020306130128.02c9b190@192.228.128.13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I'm thinking that unless the vacuum daemon needs backend info (stats?) it
could be a totally separate entity (reading from the same config file
perhaps). However that would make it slightly harder to start up automatically.

I'm not fond of the duty cycle idea. My guess is in too many cases if you
delay the vacuum, it tends to take longer, then you delay even more, then
it takes even longer...

It should be related to the number of updates and deletes on a table or
database.

Maybe you don't need the duty cycle, just check stats every X minutes, if
enough invalid rows, do vacuum. Issue: you could end up doing vacuum
continuously, would this impact performance drastically?

If there's vacuuming to be done, is it better to do it later than now? My
assumption is that lazy vacuum no longer has such a severe impact and so it
might be better to just do it ASAP. So actually a simple vacuum daemon may
be good enough.

Is there a danger of high file fragmentation with frequent lazy vacuums?

Regards,
Link.

At 09:21 PM 05-03-2002 -0500, mlw wrote:
>(for background, see conversation: "Postgresql backend to perform vacuum
>automatically" )
>
>In the idea phase 1, brainstorm
>
>Create a table for the defaults in template1
>Create a table in each database for state inforation.
>
>Should have a maximum duty cycle for vacuum vs non-vacuum on a per table
>basis.
>If a vacuum takes 3 minutes, and a duty cycle is no more than 10%, the next
>vacuum can not take place for another 30 minutes. Is this a table or database
>setting? I am thinking table. Anyone have good arguments for database?
>
>Must have a trigger point of number of total tuples vs number of dirty tuples.
>Unfortunately some tuples are more important than others, but that I don't
>know
>how to really detect that. We should be able to keep track of the number of
>dirty tuples in a table. Is it known how many tuples are in a table at any
>point? (if so, on a side note, can we use this for a count()?) How about dirty
>tuples?
>
>Is the number of deleted tuples sufficient to decide priority on vacuum? My
>thinking is that the tables with the most deleted tuples is the table which
>need most vacuum. Should ratio of deleted tuples vs total tuples or just count
>of deleted tuples. I am thinking ratio, but maybe it need be tunable.
>
>
>Here is the program flow:
>
>(1) Startup (Do this for each database.)
>(2) Get all the information from a vacuumd table.
>(2) If the table does not exist, perform a vacuum on all tables, and
>initialize
>the table to current state.
>(3) Check which tables can be vacuumed based on their duty cycle and current
>time.
>(4) If the tables eligible to be vacuumed have deleted tuples which exceed
>acceptable limits, vacuum them.
>(5) Wait a predefined time, loop (2)
>
>This is my basic idea, what do you all think?
>
>I plan to work on this in the next couple weeks. Any suggestions, notes,
>concerns, features would be welcome.
>
>---------------------------(end of broadcast)---------------------------
>TIP 3: if posting/reading through Usenet, please send an appropriate
>subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
>message can get through to the mailing list cleanly

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2002-03-06 06:11:20 Re: elog() patch
Previous Message Thomas Lockhart 2002-03-06 04:07:46 Re: ext2 vs ext3 vs RAID5 (software) benchmark