Re: Proposed LogWriter Scheme, WAS: Potential Large Performance

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Hannu Krosing <hannu(at)tm(dot)ee>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Curtis Faith <curtis(at)galtair(dot)com>, Pgsql-Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposed LogWriter Scheme, WAS: Potential Large Performance
Date: 2002-10-05 15:32:42
Message-ID: 3718.1033831962@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hannu Krosing <hannu(at)tm(dot)ee> writes:
> The writer process should just issue a continuous stream of
> aio_write()'s while there are any waiters and keep track which waiters
> are safe to continue - thus no guessing of who's gonna commit.

This recipe sounds like "eat I/O bandwidth whether we need it or not".
It might be optimal in the case where activity is so heavy that we
do actually need a WAL write on every disk revolution, but in any
scenario where we're not maxing out the WAL disk's bandwidth, it will
hurt performance. In particular, it would seriously degrade performance
if the WAL file isn't on its own spindle but has to share bandwidth with
data file access.

What we really want, of course, is "write on every revolution where
there's something worth writing" --- either we've filled a WAL blovk
or there is a commit pending. But that just gets us back into the
same swamp of how-do-you-guess-whether-more-commits-will-arrive-soon.
I don't see how an extra process makes that problem any easier.

BTW, it would seem to me that aio_write() buys nothing over plain write()
in terms of ability to gang writes. If we issue the write at time T
and it completes at T+X, we really know nothing about exactly when in
that interval the data was read out of our WAL buffers. We cannot
assume that commit records that were stored into the WAL buffer during
that interval got written to disk. The only safe assumption is that
only records that were in the buffer at time T are down to disk; and
that means that late arrivals lose. You can't issue aio_write
immediately after the previous one completes and expect that this
optimizes performance --- you have to delay it as long as you possibly
can in hopes that more commit records arrive. So it comes down to being
the same problem.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2002-10-05 15:36:35 Re: [SQL] [GENERAL] CURRENT_TIMESTAMP
Previous Message Doug McNaught 2002-10-05 15:28:07 Re: Proposed LogWriter Scheme, WAS: Potential Large Performance Gain in WAL synching