Re: Background writer process

From: Jan Wieck <JanWieck(at)Yahoo(dot)com>
To: Kurt Roeckx <Q(at)ping(dot)be>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Background writer process
Date: 2003-11-13 23:58:54
Message-ID: 3FB41ABE.8010304@Yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Kurt Roeckx wrote:
> On Thu, Nov 13, 2003 at 05:39:32PM -0500, Bruce Momjian wrote:
>> Jan Wieck wrote:
>> > Bruce Momjian wrote:
>> > > He found that write() itself didn't encourage the kernel to write the
>> > > buffers to disk fast enough. I think the final solution will be to use
>> > > fsync or O_SYNC.
>> >
>> > write() alone doesn't encourage the kernel to do any physical IO at all.
>> > As long as you have enough OS buffers, it does happy write caching until
>> > you checkpoint and sync(), and then the system freezes.
>>
>> That's not completely true. Some kernels with trickle sync, meaning
>> they sync a little bit regularly rather than all at once so write() does
>> help get those shared buffers into the kernel for possible writing.
>> Also, it is possible the kernel will issue a sync() on its own.
>
> So basicly on some kernels you want them to flush their dirty
> buffers faster.
>
> I have a feeling we should more make it depend on the system how
> we ask them not to keep it in memory too long and that maybe the
> sync(), fsync() or O_SYNC could be a fallback in case it's needed
> and there are no better ways of doing it.
>
> Maybe something as posix_fadvise() might be useful too on systems
> that have it?

That is all right and as said, how often, how much and how forced we do
the IO can all be configurable and as flexible as people see fit. But
whether you use sync(), fsync(), fdatasync(), O_SYNC, O_DSYNC or
posix_fadvise(), somewhere you have to do the write(). And that write
has to be coordinated with the buffer cache replacement strategy so that
you write those buffers that are likely to be replaced soon, and don't
write those that the strategy thinks keeping for longer anyway. Except
at a checkpoint, then you have to write whatever is dirty.

The patch I posted does this write() in coordination with the strategy
in a separate background process, so that the regular backends don't
have to write under normal circumstances (there are some places in DDL
statements that call BufferSync(), that's exceptions IMHO). Can we agree
on this general outline? Or do we have any better proposals?

Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck(at)Yahoo(dot)com #

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2003-11-14 00:00:22 Re: cvs head? initdb?
Previous Message Kurt Roeckx 2003-11-13 23:02:40 Re: Background writer process