From: | "Hiroshi Inoue" <Inoue(at)tpf(dot)co(dot)jp> |
---|---|
To: | "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | "Lamar Owen" <lamar(dot)owen(at)wgcr(dot)org>, <pgsql-hackers(at)postgresql(dot)org>, "Alfred Perlstein" <bright(at)wintelcom(dot)net> |
Subject: | RE: How to shoot yourself in the foot: kill -9 postmaster |
Date: | 2001-03-07 04:38:36 |
Message-ID: | EKEJJICOHDIEMGPNIFIJIEMBDMAA.Inoue@tpf.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> -----Original Message-----
> From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
>
> The interlock has to be tightly tied to the PGDATA directory, because
> what we're trying to protect is the files in and under that directory.
> It seems that something based on file(s) in that directory is the way
> to go.
>
> The best idea I've seen so far is Hiroshi's idea of having all the
> backends hold fcntl locks on the same file (probably postmaster.pid
> would do fine). Then the new postmaster can test whether any backends
> are still alive by trying to lock the old postmaster.pid file.
> Unfortunately, I read in the fcntl man page:
>
> Locks are not inherited by a child process in a fork(2) system call.
>
Yes flock() works well here but fcntl() doesn't.
> This makes the idea much less attractive than I originally thought:
> a new backend would not automatically inherit a lock on the
> postmaster.pid file from the postmaster, but would have to open/lock it
> for itself. That means there's a window where the new backend exists
> but would be invisible to a hypothetical new postmaster.
>
> We could work around this with the following, very ugly protocol:
>
> 1. Postmaster normally maintains fcntl read lock on its postmaster.pid
> file. Each spawned backend immediately opens and read-locks
> postmaster.pid, too, and holds that file open until it dies. (Thus
> wasting a kernel FD per backend, which is one of the less attractive
> things about this.) If the backend is unable to obtain read lock on
> postmaster.pid, then it complains and dies. We must use read locks
> here so that all these processes can hold them separately.
>
> 2. If a newly started postmaster sees a pre-existing postmaster.pid
> file, it tries to obtain a *write* lock on that file. If it fails,
> conclude that an old postmaster or backend is still alive; complain
> and quit. If it succeeds, sit for say 1 second before deleting the file
> and creating a new one. (The delay here is to allow any just-started
> old backends to fail to acquire read lock and quit. A possible
> objection is that we have no way to guarantee 1 second is enough, though
> it ought to be plenty if the lock acquisition is just after the fork.)
>
I have another idea. My main point is to not remove the existent
pidfile. For example
1) A newly started postmaster tries to obtain a write lock on the
first byte of the pidfile. If it fails the postmaster quit.
2) The postmaster tries to obtain a write lock on the second byte
of the pidfile. If it fails the postmaster quit.
3) The postmaster releases the lock of 2).
4) Each backend obtains a read-lock on the second byte of the
pidfile.
Regards,
Hiroshi Inoue
From | Date | Subject | |
---|---|---|---|
Next Message | Hiroshi Inoue | 2001-03-07 04:38:42 | RE: Proposed WAL changes |
Previous Message | Tatsuo Ishii | 2001-03-07 03:33:07 | Re: Re[2]: Re: [HACKERS] why the DB file size does not reduce when 'delete'the data in DB? |