From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Michael Paquier <michael(dot)paquier(at)gmail(dot)com> |
Cc: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: silent data loss with ext4 / all current versions |
Date: | 2016-03-15 17:46:16 |
Message-ID: | 20160315174616.ju5ybldinilz6rxv@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2016-03-15 15:39:50 +0100, Michael Paquier wrote:
> I have finally been able to spend some time reviewing what you pushed
> on back-branches, and things are in correct shape I think. One small
> issue that I have is that for EXEC_BACKEND builds, in
> write_nondefault_variables we still use one instance of rename().
Correctly so afaics, because write_nondefault_variables is by definition
non-durable. We write that stuff at every start / SIGHUP. Adding an
fsync there would be an unnecessary slowdown. I don't think it's good
policy adding fsync for stuff that definitely doesn't need it.
> Yeah, true. We definitely need to do something for that, even for HEAD
> it seems like an overkill to have something in for example src/common
> to allow frontends to have something if the fix is localized
> (pg_rewind may use something else), and it would be nice to finish
> wrapping that for the next minor release, so I propose the attached
> patches. At the same time, I think that adminpack had better be fixed
> as well, so there are actually three patches in this series, things
> that I shaped thinking about a backpatch btw, particularly for 0002.
I'm doubtful about "fixing" adminpack. We don't know how it's used, and
this could *seriously* increase its overhead for something potentially
used at a high rate.
> /*
> + * Wrapper of rename() similar to the backend version with the same function
> + * name aimed at making the renaming durable on disk. Note that this version
> + * does not fsync the old file before the rename as all the code paths leading
> + * to this function are already doing this operation. The new file is also
> + * normally not present on disk before the renaming so there is no need to
> + * bother about it.
> + */
> +static int
> +durable_rename(const char *oldfile, const char *newfile)
> +{
> + int fd;
> + char parentpath[MAXPGPATH];
> +
> + if (rename(oldfile, newfile) != 0)
> + {
> + /* durable_rename produced a log entry */
> + fprintf(stderr, _("%s: could not rename file \"%s\": %s\n"),
> + progname, current_walfile_name, strerror(errno));
> + return -1;
> + }
> +
> + /*
> + * To guarantee renaming of the file is persistent, fsync the file with its
> + * new name, and its containing directory.
> + */
> + fd = open(newfile, O_RDWR | PG_BINARY, S_IRUSR | S_IWUSR);
Why is S_IRUSR | S_IWUSR specified here?
Are you working on a fix for pg_rewind? Let's go with initdb -S in a
first iteration, then we can, if somebody is interest enough, work on
making this nicer in master.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | David Steele | 2016-03-15 17:50:12 | Re: silent data loss with ext4 / all current versions |
Previous Message | Corey Huinker | 2016-03-15 17:46:06 | Re: Soliciting Feedback on Improving Server-Side Programming Documentation |