Re: WAL write of full pages

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Shridhar Daithankar <shridhar(at)frodo(dot)hserus(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WAL write of full pages
Date: 2004-03-16 15:36:38
Message-ID: 24929.1079451398@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Shridhar Daithankar <shridhar(at)frodo(dot)hserus(dot)net> writes:
> We are hoping to prevent WAL page corruption which is part of file
> system corruption. Do we propose to tacle file system corruption in
> order to guarantee WAL integrity?

You really should study the code more before pontificating.

We *do* take measures to reduce the risk of file system corruption
breaking WAL. Specifically, a WAL segment is filled with zeroes and
fsync'd before we ever start to use it as live WAL space. The segment
is never extended while in use. Therefore, given a reasonable filesystem
implementation, the metadata for the segment file is down to disk before
we ever use it, and it does not change while we are using it.

It's impractical to do the same for data files, of course, since they
have to be able to grow.

> I can not see why writing an 8K block is any more safe than writing just the
> changes.

It's not more safe, it's just a lot easier to manage. We'd need more
than just one "dirty" flag per buffer. In any case, the kernel would
likely force the write to be a multiple of its internal buffer size
anyway. I'm not sure that kernel buffers are as universally 8K as they
once were (doesn't Linux use 4K?) but trying to manage dirtiness down to
the byte level is a waste of time.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Matthew T. O'Connor 2004-03-16 15:50:06 Re: [PERFORM] rapid degradation after postmaster restart
Previous Message Tom Lane 2004-03-16 15:10:26 Re: Reducing expression evaluation overhead