From: | David Steele <david(at)pgmasters(dot)net> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: pg_upgrade and rsync |
Date: | 2015-01-26 23:08:48 |
Message-ID: | 54C6C900.80905@pgmasters.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 1/26/15 5:11 PM, Jim Nasby wrote:
>> The race condition is a problem for pg_start/stop_backup and friends.
>> In this instance, everything will be shut down when the rsync is
>> running, so there isn't a timestamp race condition to worry about.
>
> Yeah, I'm more concerned about people that use rsync to take base
> backups. Do we need to explicitly advise against that? Is there a way
> to work around this with a sleep after pg_start_backup to make sure
> all timestamps must be different? (Admittedly I haven't fully wrapped
> my head around this yet.)
A sleep in pg_start_backup() won't work. The race condition is in rsync
if the file is modified in the same second after it is copied. Waiting
until the beginning of the next second in pg_start_backup() would
actually make a bigger window where the issue can occur.
I solved this problem in PgBackRest (an alternative to barman, etc.) by
waiting the remainder of the second after the manifest is built before
copying. That way, if a file is modified in the second after the
manifest is built that later version will still be copied. Any mods
after that will be copied in the next backup (as they should be).
PgBackRest does not use rsync, tar, etc.) so I was able to code around
the issue.
The interesting thing about this race condition is that it does not
affect the backup where it occurs. It affects the next backup when the
modified file does not get copied because the timestamp is the same as
the previous backup. Of course using checksums will solve the problem
in rsync but that's expensive.
Thus my comment earlier that the hot rsync / cold rsync method is not
absolutely safe. If you do checksums on the cold rsync then you might
as well just use them the first time - you'll have the same downtime
either way.
I've written tests to show the rsync vulnerability and another to show
that this can affect a running database. However, to reproduce it
reliably you need to force a checkpoint or have them happening pretty
close together.
--
- David Steele
david(at)pgmasters(dot)ne
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Gierth | 2015-01-26 23:12:05 | Re: Re: Abbreviated keys for Numeric |
Previous Message | Robert Haas | 2015-01-26 23:03:26 | Re: New CF app deployment |