From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | pgsql-hackers(at)postgreSQL(dot)org |
Subject: | Fundamental error in "no WAL log" index/file creation stuff |
Date: | 2005-06-25 19:27:51 |
Message-ID: | 28884.1119727671@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I believe I have figured out the problem behind the recent reports we've
been seeing of "index is not a btree" errors. Here's how to reproduce
it (in 8.0 or HEAD):
-- drop database test if present
checkpoint;
create database test;
\c test
create table t1 ( name varchar(20) primary key );
-- kill -9 either current session or bgwriter, to force XLOG replay
-- now reconnect to test
vacuum t1;
ERROR: index "t1_pkey" is not a btree
On investigation, the filesystem shows t1_pkey exists but has size 0.
The reason for this is that the only entry in the XLOG concerning
t1_pkey is an "smgr create" record --- we didn't XLOG any of the
data inserted into the index, and particularly not the metapage.
Why is that a problem, if we fsynced the index? Because *replay of
CREATE DATABASE drops and recreates the entire database directory*.
This is not trivial to avoid, because the only way to generate the
new database is to copy from another database, and it's very hard
to tell what to copy if we want it done selectively.
It seems our choices are (a) somehow fix things so CREATE DATABASE
replay doesn't have to zap the whole directory, (b) force a checkpoint
immediately after any CREATE DATABASE, so that we never have to replay
one except in a PITR situation, or (c) abandon non-WAL-logged index
and table builds.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2005-06-25 19:52:08 | Re: autovacuum bootstrap |
Previous Message | Alvaro Herrera | 2005-06-25 18:23:52 | autovacuum bootstrap |