Re: corrupt pages detected by enabling checksums

From: Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>
To: "ktm(at)rice(dot)edu" <ktm(at)rice(dot)edu>
Cc: Jim Nasby <jim(at)nasby(dot)net>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Amit Kapila <amit(dot)kapila(at)huawei(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Davis <pgsql(at)j-davis(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>, Andres Freund <andres(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: corrupt pages detected by enabling checksums
Date: 2013-05-13 13:24:24
Message-ID: CAKuK5J28t9hc51xssPng6zuAALVJhUKUi0k1E11znJ=3rqVpRg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, May 13, 2013 at 7:49 AM, ktm(at)rice(dot)edu <ktm(at)rice(dot)edu> wrote:
> On Sun, May 12, 2013 at 07:41:26PM -0500, Jon Nelson wrote:
>> On Sun, May 12, 2013 at 3:46 PM, Jim Nasby <jim(at)nasby(dot)net> wrote:
>> > On 5/10/13 1:06 PM, Jeff Janes wrote:
>> >>
>> >> Of course the paranoid DBA could turn off restart_after_crash and do a
>> >> manual investigation on every crash, but in that case the database would
>> >> refuse to restart even in the case where it perfectly clear that all the
>> >> following WAL belongs to the recycled file and not the current file.
>> >
>> >
>> > Perhaps we should also allow for zeroing out WAL files before reuse (or just
>> > disable reuse). I know there's a performance hit there, but the reuse idea
>> > happened before we had bgWriter. Theoretically the overhead creating a new
>> > file would always fall to bgWriter and therefore not be a big deal.
>>
>> For filesystems like btrfs, re-using a WAL file is suboptimal to
>> simply creating a new one and removing the old one when it's no longer
>> required. Using fallocate (or posix_fallocate) (I have a patch for
>> that!) to create a new one is - by my tests - 28 times faster than the
>> currently-used method.
>>

> What about for less cutting (bleeding) edge filesystems?

The patch would be applicable for any filesystem which implements the
fallocate/posix_fallocate calls in an efficient fashion. xfs and ext4
would both work, if I recall properly. I'm certain there are others,
and the technique would probably work on other operating systems like
the *BSDs, etc.. Additionally, it's improbable that there would be a
performance hit for other filesystems versus simply writing zeroes,
since that's the approach that is taken anyway as a fallback. Another
win is reduction in fragmentation. I would love to be able to turn off
WAL recycling to perform more useful testing.

--
Jon

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2013-05-13 13:32:48 Re: corrupt pages detected by enabling checksums
Previous Message Heikki Linnakangas 2013-05-13 13:15:20 Re: lock support for aarch64