From: | Stephen Frost <sfrost(at)snowman(dot)net> |
---|---|
To: | Michael Paquier <michael(dot)paquier(at)gmail(dot)com> |
Cc: | Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | Re: On markers of changed data |
Date: | 2017-10-06 15:39:46 |
Message-ID: | 20171006153946.GD4628@tamriel.snowman.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Tom, Michael,
* Michael Paquier (michael(dot)paquier(at)gmail(dot)com) wrote:
> On Fri, Oct 6, 2017 at 11:22 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Andrey Borodin <x4mmm(at)yandex-team(dot)ru> writes:
> >> Is it safe to use file modification time to track that file were changes
> >> since previous backup?
> >
> > I'd say no:
> >
> > 1. You don't know the granularity of the filesystem's timestamps, at least
> > not without making unportable assumptions.
> >
> > 2. There's no guarantee that the system clock can't be set backwards.
> >
> > 3. It's not uncommon for filesystems to have optimizations whereby they
> > skip or delay some updates of file mtimes. (I think this is usually
> > optional, but you couldn't know whether it's turned on.)
> >
> > #2 is probably the worst of these problems.
>
> Or upwards. A simple example of things depending on clock changes is
> for example VM snapshotting. Any logic not depending on monotonic
> timestamps, with things like clock_gettime(CLOCK_MONOTONIC) is a lot
> of fun to investigate until you know that they are not using any
> monotonic logic... So the answer is *no*, do not depend on FS-level
> timestamps. The only sane method for Postgres is really to scan the
> page header LSNs, and of course you already know that.
Really, these comments appear, at least to me, to be based on an
incorrect assumption that's only considering how tools like rsync use
mtime.
No, you can't trust rsync-based backups that look at mtime and only copy
if the mtime of the source file is currently 'more recent' than the
mtime of the destination file.
That doesn't mean that mtime can't be used to perform incremental
backups using PG, but care has to be taken when doing so to minimize the
risk of a file getting skipped that should have been copied.
There's a few things to do to minimize that risk:
Use mtime only as an indication of if the file changed from the last
time you looked at it- doesn't matter if the mtime on the file is newer
or older. If the mtime is *different*, then you can't trust that the
contents are the same and you need to include it in the backup. Of
course, combine this with checking the file size has changed, but in PG
there's lots of files of the same size, so that's not a terribly good
indicator.
Further, you have to get the mtime of all the files *before* you start
backing them up. If you take the mtime of the file at the time you
start actually copying it, then it could possibly be modified while you
copy it but without the mtime being updated from when you initially
pulled it (and that's not even talking about the concerns around the
clock time moving back and forth). To address the granularity concern,
you should also be sure to wait after you collect all the mtimes but
before actually starting the backup to the level of granularity. Any
optimization which delays setting the mtime would, certainly, still get
around to updating the mtime before the next backup runs and therefore
that file might get copied even though it hadn't changed, but that's
still technically correct, just slightly more work. Lastly, don't trust
any times which are from after the time that you collected the mtimes-
either during the initial backup or when you are doing the subsequent
incremental. Any file whose mtime is different *or* is from after the
time the mtimes were collected should be copied.
This isn't to say that there isn't some risk to using mtime, there still
is- if a backup is made of a file and its mtime collected, and then time
moves backwards, and the file is modified again at the *exact* same
time, leading the 'new' mtime to be identical to the 'old' mtime while
the file's contents are different, and that file is not subsequently
modified before the next backup happens, then the file might not be
included in the backup even though it should be.
Other risks are just blatent corruption happening in the mtime field, or
a kernel-level bug that doesn't update mtime when it should, or the
kernel somehow resetting the mtime back after the file has been changed,
or someone explicitly setting the mtime back after changing a file, or
perhaps other such attacks, though eliminating all of those risks isn't
possible (regardless of solution- someone could go change the LSN on a
page too, for example, and foil a tool which was based on that).
These are risks which I'd love to remove, but they also strike me as
quite small and ones which practical users are willing to accept for
their incremental and differential backups, though it's a reason to also
take full backups regularly.
As Alvaro notes downthread, it's also the only reasonable option
available today. It'd be great to have a better solution, and perhaps
one which summarizes the LSNs in each file would work and be better, but
that would also only be available for PG11, at the earliest.
Thanks!
Stephen
From | Date | Subject | |
---|---|---|---|
Next Message | Ashutosh Bapat | 2017-10-06 15:48:02 | Re: Partition-wise join for join between (declaratively) partitioned tables |
Previous Message | Alvaro Herrera | 2017-10-06 15:29:30 | pgsql: Fix traversal of half-frozen update chains |