Quick Links

Re: vacuum, performance, and MVCC

From:	"Mark Woodward" <pgsql(at)mohawksoft(dot)com>
To:	"Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
Cc:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Csaba Nagy" <nagy(at)ecircle-ag(dot)com>, "Hannu Krosing" <hannu(at)skype(dot)net>, "Christopher Browne" <cbbrowne(at)acm(dot)org>, "postgres hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: vacuum, performance, and MVCC
Date:	2006-06-23 19:10:39
Message-ID:	18415.24.91.171.78.1151089839.squirrel@mail.mohawksoft.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

> On 6/23/06, Mark Woodward <pgsql(at)mohawksoft(dot)com> wrote:
>> I, for one, see a particularly nasty unscalable behavior in the
>> implementation of MVCC with regards to updates.
>
> I think this is a fairly common acceptance. The overhead required to
> perform an UPDATE in PostgreSQL is pretty heavy. Actually, it's not
> really PostgreSQL's implementation, but anything that employs basic
> multi-version timestamp ordering (MVTO) style MVCC. Basically,
> MVTO-style systems require additional work to be done in an UPDATE so
> that queries can find the most current row more quickly.
>
>> This is a very pessimistic behavior
>
> Yes, and that's basically the point of MVTO in general. The nice
> thing about MVTO-style MVCC is that it isn't super complicated. No
> big UNDO strategy is needed because the old versions are always there
> and just have to satisfy a snapshot.
>
>> I still think an in-place indirection to the current row could fix the
>> problem and speed up the database, there are some sticky situations that
>> need to be considered, but it shouldn't break much.
>
> I agree, but should make clear that moving to an in-place update isn't
> a quick-fix; it will require a good amount of design and planning.

This is NOT an "in-place" update. The whole MVCC strategy of keeping old
versions around doesn't change. The only thing that does change is one
level of indirection. Rather than keep references to all versions of all
rows in indexes, keep only a reference to the first or "key" row of each
row, and have the first version of a row form the head of a linked list to
subsequent versions of each row. The list will be in decending order.

In the vast majority of cases, the overhead of this action will be
trivial. In an unmodified row, you're there. In a modified row, you have
one extra lookup. In extream cases, you may have to go back a few
versions, but I don't see that as a common behavior.

On a heavily updated row, you are never more than one jump away, the
indexes shouldn't grow overly much.

>
> What I find in these discussions is that we always talk about over
> complicating vacuum in order to fix the poor behavior in MVCC. Fixing
> autovacuum does not eliminate the overhead required to add index
> entries and everything associated with performing an UPDATE... it's
> just cleaning up the mess after the fact. As I see it, fixing the
> root problem by moving to update-in-place may add a little more
> complication to the core, but will eliminate a lot of the headaches we
> have in overhead, performance, and manageability.

Vacuum is a tool for removing old versions. I think there is an overly
eager tendency to have it fix other problems.

In response to

Re: vacuum, performance, and MVCC at 2006-06-23 18:28:26 from Jonah H. Harris

Responses

Re: vacuum, performance, and MVCC at 2006-06-23 19:12:42 from Jonah H. Harris
Re: vacuum, performance, and MVCC at 2006-06-24 06:54:56 from Jan Wieck
Re: vacuum, performance, and MVCC at 2006-06-24 12:36:05 from Martijn van Oosterhout

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Jonah H. Harris	2006-06-23 19:12:42	Re: vacuum, performance, and MVCC
Previous Message	Bruce Momjian	2006-06-23 19:08:34	Re: vacuum, performance, and MVCC