From: | Ning Yu <nyu(at)pivotal(dot)io> |
---|---|
To: | Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Cc: | Paul Guo <pguo(at)pivotal(dot)io> |
Subject: | Possible race condition in pg_mkdir_p()? |
Date: | 2019-07-18 08:17:22 |
Message-ID: | CAKmaiL3fXqQHJ-b7M3RoPpfBR2CSroPHACwUb-r1TAxH0qqsHg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Hackers,
We found a race condition in pg_mkdir_p(), here is a simple reproducer:
#!/bin/sh
basedir=pgdata
datadir=$basedir/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/w/x/y/z
logdir=$basedir/logs
n=2
rm -rf $basedir
mkdir -p $logdir
# init databases concurrently, they will all try to create the parent
dirs
for i in `seq $n`; do
initdb -D $datadir/$i >$logdir/$i.log 2>&1 &
done
wait
# there is a chance one of the initdb commands failed to create the
datadir
grep 'could not create directory' $logdir/*
The logic in pg_mkdir_p() is as below:
/* check for pre-existing directory */
if (stat(path, &sb) == 0)
{
if (!S_ISDIR(sb.st_mode))
{
if (last)
errno = EEXIST;
else
errno = ENOTDIR;
retval = -1;
break;
}
}
else if (mkdir(path, last ? omode : S_IRWXU | S_IRWXG | S_IRWXO) <
0)
{
retval = -1;
break;
}
This seems buggy as it first checks the existence of the dir and makes the
dir if it does not exist yet, however when executing concurrently a
possible race condition can be as below:
A: does a/ exists? no
B: does a/ exists? no
A: try to create a/, succeed
B: try to create a/, failed as it already exists
To prevent the race condition we could mkdir() directly, if it returns -1
and errno is EEXIST then check whether it's really a dir with stat(). In
fact this is what is done in the `mkdir -p` command:
https://github.com/coreutils/gnulib/blob/b5a9fa677847081c9b4f26908272f122b15df8b9/lib/mkdir-p.c#L130-L164
By the way, some callers of pg_mkdir_p() checks for EEXIST explicitly, such
as in pg_basebackup.c:
if (pg_mkdir_p(statusdir, pg_dir_create_mode) != 0 && errno !=
EEXIST)
{
pg_log_error("could not create directory \"%s\": %m",
statusdir);
exit(1);
}
This is still wrong with current code logic, because when the statusdir is
a file the errno is also EEXIST, but it can pass the check here. Even if
we fix pg_mkdir_p() by following the `mkdir -p` way the errno check here is
still wrong.
Best Regards
Ning
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2019-07-18 08:51:48 | Re: Intermittent pg_ctl failures on Windows |
Previous Message | Michael Paquier | 2019-07-18 08:09:14 | Re: Compile from source using latest Microsoft Windows SDK |