From: | Greg Stark <gsstark(at)mit(dot)edu> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Greg Stark <gsstark(at)mit(dot)edu>, "Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>, "'Manfred Koizar'" <mkoi-pg(at)aon(dot)at>, "'Bruce Momjian'" <pgman(at)candle(dot)pha(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Cost of XLogInsert CRC calculations |
Date: | 2005-05-31 16:02:12 |
Message-ID: | 873bs3wfi3.fsf@stark.xeocode.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> > Is the random WAL data really the concern? It seems like a more reliable way
> > of dealing with that would be to just accompany every WAL entry with a
> > sequential id and stop when the next id isn't the correct one.
>
> We do that, too (the xl_prev links and page header addresses serve that
> purpose). But it's not sufficient given that WAL records can span pages
> and therefore may be incompletely written.
Right, so the problem isn't that there may be stale data that would be
unrecognizable from real data. The problem is that the real data may be
partially there but not all there.
> > The only truly reliable way to handle this would require two fsyncs per
> > transaction commit which would be really unfortunate.
>
> How are two fsyncs going to be better than one?
Well you fsync the WAL entry and only when that's complete do you flip a bit
marking the WAL entry as commited and fsync again.
Hm, you might need three fsyncs, one to make sure the bit isn't set before
writing out the WAL record itself.
--
greg
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2005-05-31 16:27:18 | Re: Cost of XLogInsert CRC calculations |
Previous Message | Tom Lane | 2005-05-31 15:49:06 | Re: ddl triggers |