Re: pg_rewind failure by file deletion in source server

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_rewind failure by file deletion in source server
Date: 2015-06-23 12:19:00
Message-ID: 55894EB4.4080109@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 06/23/2015 07:51 AM, Michael Paquier wrote:
> So... Attached are a set of patches dedicated at fixing this issue:

Thanks for working on this!

> - 0001, add if_not_exists to pg_tablespace_location, returning NULL if
> path does not exist
> - 0002, same with pg_stat_file, returning NULL if file does not exist
> - 0003, same with pg_read_*file. I added them to all the existing
> functions for consistency.
> - 0004, pg_ls_dir extended with if_not_exists and include_dot_dirs
> (thanks Robert for the naming!)
> - 0005, as things get complex, a set of regression tests aimed to
> covering those things. pg_tablespace_location is platform-dependent,
> so there are no tests for it.
> - 0006, the fix for pg_rewind, using what has been implemented before.

With thes patches, pg_read_file() will return NULL for any failure to
open the file, which makes pg_rewind to assume that the file doesn't
exist in the source server, and will remove the file from the
destination. That's dangerous, those functions should check specifically
for ENOENT.

There's still a small race condition with tablespaces. If you run CREATE
TABLESPACE in the source server while pg_rewind is running, it's
possible that the recursive query that pg_rewind uses sees the symlink
in pg_tblspc/ directory, but its snapshot doesn't see the row in
pg_tablespace yet. It will think that the symlink is a regular file, try
to read it, and fail (if we checked for ENOENT).

Actually, I think we need try to deal with symlinks a bit harder.
Currently, pg_rewind assumes that anything in pg_tblspace that has a
matching row in pg_tablespace is a symlink, and nothing else is. I think
symlinks to directories. I just noticed that pg_rewind fails miserable
if pg_xlog is a symlink, because of that:

----
The servers diverged at WAL position 0/3023F08 on timeline 1.
Rewinding from last common checkpoint at 0/2000060 on timeline 1

"data-master//pg_xlog" is not a directory
Failure, exiting
----

I think we need to add a column to pg_stat_file output, to indicate
symbolic links, and add a pg_readlink() function. That still leaves a
race condition if the type of a file changes, i.e. a file is deleted and
a directory with the same name is created in its place, but that seems
acceptable. I don't think PostgreSQL ever does such a thing, so that
could only happen if you mess with the data directory manually while the
server is running.

I just realized another problem: We recently learned the hard way that
some people have files in the data directory that are not writeable by
the 'postgres' user
(http://www.postgresql.org/message-id/20150523172627.GA24277@msg.df7cb.de)
pg_rewind will try to overwrite all files it doesn't recognize as
relation files, so it's going to fail on those. A straightforward fix
would be to first open the destination file in read-only mode, and
compare its contents, and only open the file in write mode if it has
changed. It would still fail when the files really differ, but I think
that's acceptable.

I note that pg_rewind doesn't need to distinguish between an empty and a
non-existent directory, so it's quite silly for it to pass
include_dot_dirs=true, and then filter out "." and ".." from the result
set. The documentation should mention the main reason for including "."
and "..": to distinguish between an empty and non-existent directory.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2015-06-23 12:53:31 Re: A couple of newlines missing in pg_rewind log entries
Previous Message Sandro Santilli 2015-06-23 11:41:25 Memory context at PG_init call ?