Re: Proposal for background vacuum full/cluster

From: Paul Tillotson <pntil(at)shentel(dot)net>
To: "Jim C(dot) Nasby" <decibel(at)decibel(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal for background vacuum full/cluster
Date: 2005-04-21 00:10:23
Message-ID: 4266EF6F.9020505@shentel.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Jim C. Nasby wrote:

>I talked to a few people on IRC about this and they didn't think I was
>nuts, so maybe this is something practical...
>
>In a nutshell, my idea is to use the normal transactional/XID code to
>relocate tuples in the heap. Think of doing an UPDATE field=field if you
>could tell update what page to put the new tuple on.
>
Be careful not to fire UPDATE triggers on the tuple while doing so.

>Of course, it's not quite that simple. For starters, you'd want to do a
>conventional vacuum before this, both to free as much space as possible
>and to update the FSM. It might also be necessary to prevent backends
>from using the pages at the end of the heap (which you're trying to
>empty). I'm guessing that could be done just by removing the pages from
>the FSM. You'd also need to vacuum after emptying these pages to reclaim
>the disk space. To facilitate these things, it might be useful to be
>able to vacuum parts of the heap. So as pages are emptied at the end of
>the heap, they can be vacuumed and reclaimed while the pages are still
>probably in cache (and without requiring a re-vacuum of the entire
>table).
>
>
>
Keep in mind that the transaction that does the update can't also vacuum
it's own tuples. You'd have to end one transaction, then wait until
every transaction running while the updater ran finishes, then start the
transaction that vacuums. Obviously your command would need to be able
to start and end transactions. (Meaning that it can't be a user defined
function, and it probably can't be a normal self-contained command in
postgres.)

>Taking this technique one step further, it should also be possible to
>cluster in the background without blocking everything. One way to do
>this would be to empty the first page in the heap by moving it's tuples
>elsewhere, and vacuuming that page (but not putting it in the FSM). Once
>that page is available, you can start reading in from the clustering
>index and moving those tuples to the first page.
>
>One thing that might be an issue for both ideas is index bloat. But
>since reindex is a non-blocking operation, it doesn't seem unreasonable
>to either do that automatically or have the user do it.
>
>Is this TODOable?
>
>
I asked about something like this on the -hackers list a while back, but
didn't get any response from any of the knowledgeable hackers.

Are you thinking of coding this, or just suggesting it for others? I
was thinking of coding something like this but found that I didn't
understand enough of the internals of how the vacuum command actually
works to be able to write this. I'd be willing to devote perhaps a few
hours a week to it if you want to help me.

Regards,

Paul Tillotson

-----------------

P. S.

The last time I thought about it, I decided that the best solution is
probably one that works just like vacuum full except that it scans the
table in reverse order. It would do something like this:

- Wait for exclusive lock.
- Start at the end of the table -- call this page I.
- If page I is completely empty, shrink the heap and go to step 1 again.
(Page I is not empty now.)
- *Scan forward in the table until you find a page that is empty. (Call
it J)
If no such page is found, there is no more free space in the
table. Exit.
- Move the tuples from page I to page J.
- Drop the exclusive lock. and go to step one.

*On subsequent iterations of the loop, do not reset J. i.e., start
scanning at the last place that free space was known to exist.
Presumably no useful amount of free space will get created in the table
while this algorithm is running.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David F. Skoll 2005-04-21 00:26:48 Re: Postgres: pg_hba.conf, md5, pg_shadow, encrypted passwords
Previous Message Gavin Sherry 2005-04-20 23:54:47 Re: WAL/PITR additional items