Re: Checkpoint cost, looks like it is WAL/CRC

From: "Zeugswetter Andreas DAZ SD" <ZeugswetterA(at)spardat(dot)at>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Bruno Wolff III" <bruno(at)wolff(dot)to>
Cc: "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>, "Greg Stark" <gsstark(at)mit(dot)edu>, "Russell Smith" <mr-russ(at)pws(dot)com(dot)au>, <josh(at)agliodbs(dot)com>, "Postgres Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Checkpoint cost, looks like it is WAL/CRC
Date: 2005-07-07 10:04:31
Message-ID: E1539E0ED7043848906A8FF995BDA57945BA2A@m0143.s-mxs.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


>> Are you sure about that? That would probably be the normal case, but
>> are you promised that the hardware will write all of the sectors of a

>> block in order?
>
> I don't think you can possibly assume that. If the block
> crosses a cylinder boundary then it's certainly an unsafe
> assumption, and even within a cylinder (no seek required) I'm
> pretty sure that disk drives have understood "write the next
> sector that passes under the heads"
> for decades.

A lot of hardware exists, that guards against partial writes
of single IO requests (a persistent write cache for a HP raid
controller for intel servers costs ~500$ extra).

But, the OS usually has 4k (some 8k) filesystem buffer size,
and since we do not use direct io for datafiles, the OS might decide
to schedule two 4k writes differently for one 8k page.

If you do not build pg to match your fs buffer size you cannot
guard against partial writes with hardware :-(

We could alleviate that problem with direct io for datafiles.

Andreas

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zeugswetter Andreas DAZ SD 2005-07-07 10:11:43 Re: Checkpoint cost, looks like it is WAL/CRC
Previous Message Koichi Suzuki 2005-07-07 07:54:42 A couple of patches for PostgreSQL 64bit support