From: | "Mark Woodward" <pgsql(at)mohawksoft(dot)com> |
---|---|
To: | "Hannu Krosing" <hannu(at)skype(dot)net> |
Cc: | "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, "Christopher Browne" <cbbrowne(at)acm(dot)org>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: vacuum, performance, and MVCC |
Date: | 2006-06-23 14:30:15 |
Message-ID: | 17943.24.91.171.78.1151073015.squirrel@mail.mohawksoft.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> Ãhel kenal päeval, N, 2006-06-22 kell 12:41, kirjutas Mark Woodward:
>
>> > Depending on exact details and optimisations done, this can be either
>> > slower or faster than postgresql's way, but they still need to do
>> > something to get transactional visibility rules implemented.
>>
>> I think they have a different strategy. I think they maintain the notion
>> of "current version" of a row, and hunt for previous versions when
>> needed,
>> at least that's how I suspect Oracle does it with redo logs.
>
> Not "current" but "last" :)
True
>
> And one side effect of redo logs is that it is practically impossible to
> do large deletes on production databases. So you design around that,
> like you have to design around limitations of MVCC.
Think that's bad, try doing an update in PostgreSQL on a table with 20
million rows and a few indexes. I had to write a script to chunk up the
block update into segments and vacuum between each.
>
>> >> > > There has to be a more linear way of handling this scenario.
>> >> >
>> >> > So vacuum the table often.
>> >>
>> >> It's easy to say VACUUM often... but I'd bet that vacuuming is going
>> >> to lessen the throughput in his tests even more; no matter how it's
>> >> tuned.
>> >
>> > Running VACUUM often/continuously will likely keep his update rate
>> > fluctuatons within a corridor of maybe 5-10%, at the cost of 1-2%
>> extra
>> > load. At least if vacuum is configured right and the server is not
>> > already running at 100% IO saturation, in which case it will be worse.
>>
>> Assuming the table is a reasonable size, the I/O required for vacuum
>> doesn't kill everything else!
>
> I have solved the problem of unneccessary IO by keeping active and
> finished rows in separate tables, with the finish() function moving the
> row between tables.
Sorry, an RDBMS is a "relational database management system," if you are
doing the "database management," it isn't a very good RDBMS.
>
> In case of the number of actively modified rows being in only tens or
> low hundreds of thousands of rows, (i.e. the modified set fits in
> memory) the continuous vacuum process shows up as just another backend,
> not really taking order of magnitude more resources. It mainly generates
> WAL traffic, as modified pages are already in memory/cache and are
> mostly synced by background writer and/or checkpoint.
>
> Of course you have to adjust vacuum_cost_* variables so as to not
> saturate IO.
These sort of solutions, IMHO, don't show how good PostgreSQL is, but show
where it is very lacking.
>
>> > The max throughput figure is not something you actually need very
>> often
>> > in production.
>>
>> No, but you need to have some degree of certainty and predictability in
>> the system you are developing.
>
> Yup. You have to design it so it has.
I was refereing to the system as a whole and the individual components.
PostgreSQL's performance under some pathalogical condictions is not very
predictable or reliable.
>
>> > What is interesting is setting up the server so that you
>> > can service your loads comfortably. Running the server at 100% lead is
>> > not anything you want to do on production server. There will be things
>> > you need to do anyway and you need some headroom for that.
>>
>> Of course, you design it so peaks are easily managed, but unless you run
>> vacuum continuously, and that has its own set of problems, you run into
>> this problem, and it can get really really bad.
>
> Usually it gets really bad if you *don't* run vacuum continuously, maybe
> hopeing to do it in slower times at night. For high-update db you have
> to run it continuously, maybe having some 5-15 sec pauses between runs.
And how much I/O does this take?
From | Date | Subject | |
---|---|---|---|
Next Message | James Robinson | 2006-06-23 14:31:22 | Webcluster session storage, was vacuum, performance, and MVCC |
Previous Message | Larry Rosenman | 2006-06-23 14:30:14 | Re: Anyone still care about Cygwin? (was Re: [CORE] GPL Source and Copyright Questions) |