Re: pg_basebackup, pg_receivexlog and data durability (was: silent data loss with ext4 / all current versions)

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: pg_basebackup, pg_receivexlog and data durability (was: silent data loss with ext4 / all current versions)
Date: 2016-09-15 01:55:49
Message-ID: CAB7nPqQJunJ4z7pzuuJhgwzO5gbw7e2fX8HHPMsVSo2FOK1b6w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 15, 2016 at 9:44 AM, Peter Eisentraut
<peter(dot)eisentraut(at)2ndquadrant(dot)com> wrote:
> On 9/12/16 11:16 PM, Michael Paquier wrote:
>>> I don't think tar file output in pg_basebackup needs to be fsynced.
>>> > It's just a backup file like what pg_dump produces, and we don't fsync
>>> > that either. The point of this change is to leave a data directory in
>>> > a synced state equivalent to what initdb leaves behind.
>> Here I don't agree. The point of the patch is to make sure that what
>> gets generated by pg_basebackup is consistent on disk, for all the
>> formats. It seems weird to me that we could say that the plain format
>> makes things consistent while the tar format may cause some files to
>> be lost in case of power outage. The docs make it clear I think:
>> + By default, <command>pg_basebackup</command> will wait for all files
>> + to be written safely to disk.
>> But perhaps this should directly mention that this is done for all the formats?
>
> That doesn't really explain why we fsync.

Data durability, particularly on ext4 as it has been discussed a
couple of months back [1]. In case if a crash it could be perfectly
possible to lose files, hence we need to be sure that the files
themselves are flushed, as well as their parent directory to prevent
any problems. I think that we should protect users' backups as much as
we can, in the range we can.

> If we think that all
> "important" files should be fsynced, why aren't we doing it in pg_dump?

pg_dump should do it where it can, see thread [2]. I am tackling
problems once at a time, and that's as well a reason why I would like
us to have a common set of routines in src/common or src/fe_utils to
help improve this handling.

> Or psql, or server-side copy? Similarly, there is no straightforward
> mechanism by which you can unpack the tar file generated by
> pg_basebackup and get the unpacked directory fsynced properly. (I
> suppose initdb -S should be recommended.)

Yes, those are cases that you cannot cover. Imagine for example
pg_basebackup's tar or pg_dump data sent to stdout. There is nothing
we can actually do for all those cases. However what we can do it
giving a set of options making possible to get consistent backups for
users.

[1] Silent data loss on ext4:
https://www.postgresql.org/message-id/56583BDD.9060302@2ndquadrant.com

[2] Data durability:
https://www.postgresql.org/message-id/20160327233033.GD20662@awork2.anarazel.de
--
Michael

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2016-09-15 02:28:18 Re: [PATCH v2] Add overflow checks to money type input function
Previous Message Amit Langote 2016-09-15 01:46:30 Re: Printing bitmap objects in the debugger