| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> | 
|---|---|
| To: | pgsql-hackers(at)postgresql(dot)org | 
| Subject: | Re: Lockfile restart failure is still there :-( | 
| Date: | 2005-03-18 03:56:06 | 
| Message-ID: | 1150.1111118166@sss.pgh.pa.us | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
I wrote:
> I was thinking at the time, and still think, it is reasonable to treat
> EPERM as being a safe rather than unsafe case.
I remembered why we were rejecting that case originally: it protects us
against blowing away a Unix socket file belonging to a postmaster that's
running under someone else's userid.  Unlike the data directory
lockfile, we can't expect /tmp's ownership to tell us anything :-(.
However, we can still dispense with the EPERM check for the socket
lockfile too, because:
1. We create all these lockfiles with mode 0600.  If the lockfile were
made under another userid, we'd have bombed out earlier for fail to read
the lockfile.
2. On most modern platforms, /tmp has a directory stickybit and we'd not
be able to remove someone else's lockfile or socket file anyway.
So I think it's OK ...
I have applied the attached patch to HEAD and 8.0.x.  Can anyone comment
on whether it would be safe/reasonable to enable the data directory
ownership check on Windows?
regards, tom lane
*** src/backend/postmaster/postmaster.c.orig	Fri Mar 11 20:38:55 2005
--- src/backend/postmaster/postmaster.c	Thu Mar 17 22:29:27 2005
***************
*** 953,959 ****
--- 953,982 ----
  	}
  
  	/*
+ 	 * Check that the directory belongs to my userid; if not, reject.
+ 	 *
+ 	 * This check is an essential part of the interlock that prevents two
+ 	 * postmasters from starting in the same directory (see CreateLockFile()).
+ 	 * Do not remove or weaken it.
+ 	 *
+ 	 * XXX can we safely enable this check on Windows?
+ 	 */
+ #if !defined(WIN32) && !defined(__CYGWIN__)
+ 	if (stat_buf.st_uid != geteuid())
+ 		ereport(FATAL,
+ 				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ 				 errmsg("data directory \"%s\" has wrong ownership",
+ 						DataDir),
+ 				 errhint("The server must be started by the user that owns the data directory.")));
+ #endif
+ 
+ 	/*
  	 * Check if the directory has group or world access.  If so, reject.
+ 	 *
+ 	 * It would be possible to allow weaker constraints (for example, allow
+ 	 * group access) but we cannot make a general assumption that that is
+ 	 * okay; for example there are platforms where nearly all users customarily
+ 	 * belong to the same group.  Perhaps this test should be configurable.
  	 *
  	 * XXX temporarily suppress check when on Windows, because there may not
  	 * be proper support for Unix-y file permissions.  Need to think of a
*** src/backend/utils/init/miscinit.c.orig	Fri Dec 31 17:46:19 2004
--- src/backend/utils/init/miscinit.c	Thu Mar 17 22:29:17 2005
***************
*** 505,510 ****
--- 505,513 ----
  	{
  		/*
  		 * Try to create the lock file --- O_EXCL makes this atomic.
+ 		 *
+ 		 * Think not to make the file protection weaker than 0600.  See
+ 		 * comments below.
  		 */
  		fd = open(filename, O_RDWR | O_CREAT | O_EXCL, 0600);
  		if (fd >= 0)
***************
*** 564,569 ****
--- 567,587 ----
  		 * then all but the immediate parent shell will be root-owned processes
  		 * and so the kill test will fail with EPERM.
  		 *
+ 		 * We can treat the EPERM-error case as okay because that error implies
+ 		 * that the existing process has a different userid than we do, which
+ 		 * means it cannot be a competing postmaster.  A postmaster cannot
+ 		 * successfully attach to a data directory owned by a userid other
+ 		 * than its own.  (This is now checked directly in checkDataDir(),
+ 		 * but has been true for a long time because of the restriction that
+ 		 * the data directory isn't group- or world-accessible.)  Also,
+ 		 * since we create the lockfiles mode 600, we'd have failed above
+ 		 * if the lockfile belonged to another userid --- which means that
+ 		 * whatever process kill() is reporting about isn't the one that
+ 		 * made the lockfile.  (NOTE: this last consideration is the only
+ 		 * one that keeps us from blowing away a Unix socket file belonging
+ 		 * to an instance of Postgres being run by someone else, at least
+ 		 * on machines where /tmp hasn't got a stickybit.)
+ 		 *
  		 * Windows hasn't got getppid(), but doesn't need it since it's not
  		 * using real kill() either...
  		 *
***************
*** 577,587 ****
  			)
  		{
  			if (kill(other_pid, 0) == 0 ||
! 				(errno != ESRCH
  #ifdef __BEOS__
! 				 && errno != EINVAL
  #endif
! 				 ))
  			{
  				/* lockfile belongs to a live process */
  				ereport(FATAL,
--- 595,605 ----
  			)
  		{
  			if (kill(other_pid, 0) == 0 ||
! 				(errno != ESRCH &&
  #ifdef __BEOS__
! 				 errno != EINVAL &&
  #endif
! 				 errno != EPERM))
  			{
  				/* lockfile belongs to a live process */
  				ereport(FATAL,
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Bruce Momjian | 2005-03-18 04:30:21 | Query crashes/hangs server | 
| Previous Message | Christopher Browne | 2005-03-18 03:28:47 | Re: Excessive growth of pg_attribute and other system tables |