From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | pgsqlrpms-hackers(at)pgfoundry(dot)org |
Cc: | pgsql-hackers(at)postgreSQL(dot)org |
Subject: | Safer auto-initdb for RPM init script |
Date: | 2006-08-25 13:19:52 |
Message-ID: | 22918.1156511992@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
We've seen more than one report of corruption of PG databases that
seemed to be due to the willingness of the RPM init script to run
initdb if it thinks the data directory isn't there. This is pretty
darn risky on an NFS volume, for instance, which might be offline
at the instant the script looks. The failure case is
- script doesn't see data directory
- script runs initdb and starts postmaster
- offline volume comes online
- KABOOM
The initdb creates a "local" database that's physically on the root
volume underneath the mountpoint directory for the intended volume.
After the mountable volume comes online, these files are shadowed
by the original database files. The problem is that by this point the
postmaster has a copy of pg_control in memory from the freshly-initdb'd
database, and that pg_control has a WAL end address and XID counter far
less than is correct for the real database. Havoc ensues, very probably
resulting in a hopelessly corrupt database.
I don't really want to remove the auto-initdb feature from the script,
because it's important not to drive away newbies by making Postgres
hard to start for the first time. But I think we'd better think about
ways to make it more bulletproof.
The first thought that comes to mind is to have the RPM install create
the data directory if not present and create a flag file in it showing
that it's safe to initdb. Then the script is allowed to initdb only
if it finds the directory and the flag file but not PG_VERSION.
Something like (untested off-the-cuff coding)
%post server
if [ ! -d $PGDATA ]; then
mkdir $PGDATA
touch $PGDATA/NO_DATABASE_YET
fi
and in initscript
if [ -d $PGDATA -a -f $PGDATA/NO_DATABASE_YET -a ! -f $PGDATA/PG_VERSION ] ; then
rm -f $PGDATA/NO_DATABASE_YET && initdb ...
fi
If the data directory is not mounted then the -d test would fail,
unless the directory is itself the mount point, in which case it
would be there but not contain the NO_DATABASE_YET file.
I can still imagine ways for this to fail, eg if you run an RPM
install or upgrade while your mountable data directory is offline.
But it ought to be an order of magnitude safer than things are now.
(Hm, maybe the %post script should only run during an RPM install,
not an upgrade.)
Comments? Anyone see a better way?
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Dunstan | 2006-08-25 13:29:59 | Re: Tricky bugs in concurrent index build |
Previous Message | Gregory Stark | 2006-08-25 13:10:11 | Re: Tricky bugs in concurrent index build |