From: | Michael Paquier <michael(dot)paquier(at)gmail(dot)com> |
---|---|
To: | PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, Magnus Hagander <magnus(at)hagander(dot)net> |
Subject: | pg_basebackup, pg_receivexlog and data durability (was: silent data loss with ext4 / all current versions) |
Date: | 2016-05-13 06:39:35 |
Message-ID: | CAB7nPqQ_B0j3n1t=8c1ZLHXF1b8Tf4XsXoUC9bP9t5Hab--SMg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi all,
Beginning a new thread because the ext4 issues are closed, and because
pg_basebackup data durability meritates a new thread. And in short
about the problem: pg_basebackup makes no effort in being sure that
the data it backs up is on disk, which is bad... One possible
recommendation is to use initdb -S after running pg_basebackup, but
making sure that data is on disk should be done before pg_basebackup
ends.
On Thu, May 12, 2016 at 8:09 PM, I wrote:
> And actually this won't fly high if there is no equivalent of
> walkdir() or if the fsync()'s are not applied recursively. On master
> at least the refactoring had better be done cleanly first... For the
> back branches, we could just have some recursive call like
> fsync_recursively and keep that in src/bin/pg_basebackup. Andres, do
> you think that this should be part of fe_utils or src/common/? I'd
> tend to think the latter is more adapted as there is an equivalent in
> the backend. On back-branches, we could just have something like
> fsync_recursively that walks though the paths. An even more simple
> approach would be to fsync() individually things that have been
> written, but that would suck in performance.
So, attached are two patches that apply on HEAD to address the problem
of pg_basebackup that does not sync the data it writes. As
pg_basebackup cannot use directly initdb -S because, as a client-side
utility, it may be installed while initdb is not (see Fedora and
RHEL), I have refactored the code so as the routines in initdb.c doing
the fsync of PGDATA and other fsync stuff are in src/fe_utils/, and
this is 0001.
Patch 0002 is a set of fixes for pg_basebackup:
- In plain mode, fsync_pgdata is used so as all the tablespaces are
fsync'd at once. This takes care as well of the case where pg_xlog is
a symlink.
- In tar mode (no stdout), each tar file is synced individually, and
the base directory is synced once at the end.
In both cases, failures are not considered fatal.
With pg_basebackup -X and pg_receivexlog, the manipulation of WAL
files is made durable by using fsync and durable_rename where needed
(credits to Andres mainly for this part).
This set of patches is aimed only at HEAD. Back-patchable versions of
this patch would need to copy fsync_pgdata and friends into
streamutil.c for example.
I am adding that to the next CF for review as a bug fix.
Regards,
--
Michael
Attachment | Content-Type | Size |
---|---|---|
0001-Relocation-fsync-routines-of-initdb-into-fe_utils.patch | application/x-download | 20.5 KB |
0002-Issue-fsync-more-carefully-in-pg_receivexlog-and-pg_.patch | application/x-download | 10.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | George Neuner | 2016-05-13 06:49:36 | Re: NULL concatenation |
Previous Message | Amit Kapila | 2016-05-13 05:01:54 | Re: [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan) |