Re: vacuum, performance, and MVCC

From: "Mark Woodward" <pgsql(at)mohawksoft(dot)com>
To: "Csaba Nagy" <nagy(at)ecircle-ag(dot)com>
Cc: "PFC" <lists(at)peufeu(dot)com>, "Chris Browne" <cbbrowne(at)acm(dot)org>, "postgres hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: vacuum, performance, and MVCC
Date: 2006-06-23 11:57:38
Message-ID: 18739.24.91.171.78.1151063858.squirrel@mail.mohawksoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>> I suppose you have a table memberships (user_id, group_id) or something
>> like it ; it should have as few columns as possible ; then try regularly
>> clustering on group_id (maybe once a week) so that all the records for a
>> particular group are close together. Getting the members of a group to
>> send them an email should be faster (less random seeks).
>
> It is like this, and some more bookkeeping data which must be there...
> we could split the table for smaller records or for updatable/stable
> fields, but at the end of the day it doesn't make much sense, usually
> all the data is needed and I wonder if more big/shallow tables instead
> of one big/wider makes sense...
>
> Regularly clustering is out of question as it would render the system
> unusable for hours. There's no "0 activity hour" we could use for such
> stuff. There's always something happening, only the overall load is
> smaller at night...
>

Let me ask a question, you have this hundred million row table. OK, how
much of that table is "read/write?" Would it be posible to divide the
table into two (or more) tables where one is basically static, only
infrequent inserts and deletes, and the other is highly updated?

The "big" thing in performance is the amount of disk I/O, if you have a
smaller active table with only a single index, then you may be able to cut
your disk I/O time really down. The smaller the row size, the more rows
fit into a block. The fewer blocks the less dissk I/O. The less disk I/O
the bbetter the performance.

Also, and anyone listening correct me if I'm wrong, you NEED to vacuum
frequently because the indexes grow and vacuuming them doesnt remove
everything, sometimes a REINDEX or a drop/recreate is the only way to get
performance back. So if you wait too long between vacuums, your indexes
grow and spread across more disk blocks than they should and thus use
more disk I/O to search and/or shared memory to cache.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2006-06-23 12:26:41 Re: [CORE] GPL Source and Copyright Questions
Previous Message Csaba Nagy 2006-06-23 11:53:55 Re: vacuum, performance, and MVCC