From: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Alex Ignatov <a(dot)ignatov(at)postgrespro(dot)ru>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp> |
Subject: | Re: Is pg_control file crashsafe? |
Date: | 2016-05-05 06:22:57 |
Message-ID: | CAEepm=0hh_Dvd2Q+fcjYpkVzSoNX2+f167cYu5nwu=qh5HZhJw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, May 5, 2016 at 4:32 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> writes:
>> How about using 512 bytes as a write size and perform direct writes rather
>> than going via OS buffer cache for control file?
>
> Wouldn't that fail outright under a lot of implementations of direct write;
> ie the request needs to be page-aligned, for some not-very-determinate
> value of page size?
>
> To repeat, I'm pretty hesitant to change this logic. While this is not
> the first report we've ever heard of loss of pg_control, I believe I could
> count those reports without running out of fingers on one hand --- and
> that's counting since the last century. It will take quite a lot of
> evidence to convince me that some other implementation will be more
> reliable. If you just come and present a patch to use direct write, or
> rename, or anything else for that matter, I'm going to reject it out of
> hand unless you provide very strong evidence that it's going to be more
> reliable than the current code across all the systems we support.
I'm not sure how those ideas address the reported problem anyway: the
*length* was unexpectedly zero after a crash. UpdateControlFile
doesn't change the length of the control file, since it doesn't
specify O_TRUNC or O_APPEND and it always writes the same size. So it
seems like a pretty weird failure mode affecting filesystem metadata
(which I wouldn't expect to change anyway, but I would expect to be
journaled if it did), not a file-contents-atomicity problem. Whether
or not the page cache is involved in a write to a preallocated file
doesn't seem relevant to a case of unexpected truncation, and the
atomic rename trick doesn't seem relevant either unless someone with
expert knowledge of NTFS could explain how a crash could lead to
truncation in the first place, and how rename would help.
--
Thomas Munro
http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2016-05-05 06:29:57 | Re: Segmentation fault when max_parallel degree is very High |
Previous Message | Rushabh Lathia | 2016-05-05 05:39:27 | Re: pg_dump broken for non-super user |