Re: vacuum, performance, and MVCC

From: Hannu Krosing <hannu(at)skype(dot)net>
To: Mark Woodward <pgsql(at)mohawksoft(dot)com>
Cc: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, Christopher Browne <cbbrowne(at)acm(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: vacuum, performance, and MVCC
Date: 2006-06-23 13:30:52
Message-ID: 1151069452.3828.20.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Ühel kenal päeval, N, 2006-06-22 kell 12:41, kirjutas Mark Woodward:

> > Depending on exact details and optimisations done, this can be either
> > slower or faster than postgresql's way, but they still need to do
> > something to get transactional visibility rules implemented.
>
> I think they have a different strategy. I think they maintain the notion
> of "current version" of a row, and hunt for previous versions when needed,
> at least that's how I suspect Oracle does it with redo logs.

Not "current" but "last" :)

And one side effect of redo logs is that it is practically impossible to
do large deletes on production databases. So you design around that,
like you have to design around limitations of MVCC.

> >> > > There has to be a more linear way of handling this scenario.
> >> >
> >> > So vacuum the table often.
> >>
> >> It's easy to say VACUUM often... but I'd bet that vacuuming is going
> >> to lessen the throughput in his tests even more; no matter how it's
> >> tuned.
> >
> > Running VACUUM often/continuously will likely keep his update rate
> > fluctuatons within a corridor of maybe 5-10%, at the cost of 1-2% extra
> > load. At least if vacuum is configured right and the server is not
> > already running at 100% IO saturation, in which case it will be worse.
>
> Assuming the table is a reasonable size, the I/O required for vacuum
> doesn't kill everything else!

I have solved the problem of unneccessary IO by keeping active and
finished rows in separate tables, with the finish() function moving the
row between tables.

In case of the number of actively modified rows being in only tens or
low hundreds of thousands of rows, (i.e. the modified set fits in
memory) the continuous vacuum process shows up as just another backend,
not really taking order of magnitude more resources. It mainly generates
WAL traffic, as modified pages are already in memory/cache and are
mostly synced by background writer and/or checkpoint.

Of course you have to adjust vacuum_cost_* variables so as to not
saturate IO.

> > The max throughput figure is not something you actually need very often
> > in production.
>
> No, but you need to have some degree of certainty and predictability in
> the system you are developing.

Yup. You have to design it so it has.

> > What is interesting is setting up the server so that you
> > can service your loads comfortably. Running the server at 100% lead is
> > not anything you want to do on production server. There will be things
> > you need to do anyway and you need some headroom for that.
>
> Of course, you design it so peaks are easily managed, but unless you run
> vacuum continuously, and that has its own set of problems, you run into
> this problem, and it can get really really bad.

Usually it gets really bad if you *don't* run vacuum continuously, maybe
hopeing to do it in slower times at night. For high-update db you have
to run it continuously, maybe having some 5-15 sec pauses between runs.

--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me: callto:hkrosing
Get Skype for free: http://www.skype.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John DeSoi 2006-06-23 13:56:04 Re: [CORE] GPL Source and Copyright Questions
Previous Message Csaba Nagy 2006-06-23 13:30:45 Re: vacuum, performance, and MVCC