Re: Autovacuum Improvements

From: "Joris Dobbelsteen" <Joris(at)familiedobbelsteen(dot)nl>
To: "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>
Cc: "Chris Browne" <cbbrowne(at)acm(dot)org>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: Autovacuum Improvements
Date: 2007-01-10 18:26:36
Message-ID: 73427AD314CC364C8DF0FFF9C4D693FF54E2@nehemiah.joris2k.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

>-----Original Message-----
>From: Alvaro Herrera [mailto:alvherre(at)commandprompt(dot)com]
>Sent: dinsdag 9 januari 2007 22:18
>To: Joris Dobbelsteen
>Cc: Chris Browne; pgsql-general(at)postgresql(dot)org
>Subject: Re: [GENERAL] Autovacuum Improvements
>
>Joris Dobbelsteen wrote:
>
>> Now we have at least one different model, lets mix in other captures
>> and situations. So it cannot be done with only YOUR data, I
>fully agree.
>> But if you have sufficient data you can find the
>generalization of the
>> model to make it work (resonable) in sufficient situations.
>> Of course models need time to evolve, but so does the implementation
>> currently at a slow rate. From do it yourself, to scripts, to the
>> current autovacuum integration (which is good). From doing
>all tables
>> sequentially to having some intelligence by update
>thresholds, to what
>> will be next.
>>
>> I think you should better solve the problem is this ways, as models
>> are relative easy to compare compared to arguments without
>> analyzable/simulatible data.
>
>To be frank, I'm not sure I understand what you're saying
>here. I'm sure more analysis is good; that's easy to agree with.

I hope to make it more clear.

With a model I mean something which I can use to:
* estimates the performance effects of 'defragmentation', excessive
pages or a low page fill factor.
* estimates how much vacuum affects the performance while it is running.
* as a result, estimate how the performance was improved as a result of
vacuuming.

With this its possible to guess how much vacuum costs and if you will
benefit from vacuuming. And hopefully it gives more details on how
performance is affected.

Examples (under DISCLAIMER, its my low-end machine):
* How does fragmentation affect inserts. I've observed that they take
constant time, regardless of the table size, transaction size or number
of columns. On my low-end system it was limited by the CPU. So here it
does not require vacuuming. Probably some test elsewhere will give
different results.
* How does it affect index scans? (I expect it mainly related to index
size and cache limits)
* How does it affect table scans (I expect it goes linear with the table
size).
* How does it affect updates?
* How does it affect deletes?
* (What did I forget?)
* Vacuum seems to scale linear with the table size. Futher its extremely
I/O intensive and not much else.

Do they depend on table size (like vacuum/table scan), on index size
(index scans?) or do they scale O(1) (inserts?)...

I've not yet found anything concrete, except several comments that it
will depend on your situation. However, the information should be
available or is mainly estimated by experience of DBAs.

>However, I don't want to be trapped in a design that's too
>hard to implement, or too hard for DBAs to manage.

Exactly...
If you can model it, you can implement it.

But you can also analyze it ahead of time:
* You can see what you need to configure, as its in the model. Perhaps
much can be done to automatically tune
* Furthermore you can predict its impact (expected performance) well
ahead of implementation and deployment.
* You can test your ideas on a model and see if it would work out, of
course leaving you to see how accurate they are in practical
situations...

That is why I proposed it.

>There have been proposals to add these knobs:
>- maximum number of simultaneous processes (and make it more than 1)
Indeed, vacuum takes to long on some tables. My finding is that the
vacuum time scales linearly with the number of pages. Perhaps vacuum can
be more efficient or do partial cleanups instead of full page scans?
>- times of day on which to change parameters (i.e. disable vacuum
> altogether, or make it more agressive or less)

- Something DBAs call "HOT" tables.
- There is something about maintenance windows, or would you rather let
the administrator specify when the database has it business hours.

There has been discussion about heuristics and metrics required for
that. In some part of the thread its called "dirtyness" which has a
quite ambigious meaning.

- Joris

Browse pgsql-general by date

  From Date Subject
Next Message km 2007-01-10 18:28:26 Re: Postgres Replication
Previous Message Reece Hart 2007-01-10 17:54:35 Re: Foreign Key Identification