From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Cc: | Mark Mielke <mark(at)mark(dot)mielke(dot)cc>, Florian Weimer <fw(at)deneb(dot)enyo(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com> |
Subject: | Directory fsync and other fun |
Date: | 2010-02-20 01:30:10 |
Message-ID: | 201002200230.16951.andres@anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi all,
I started setting up some halfway automated method of simulating hard crashes
and even while setting those up I found some pretty unsettling results...
Now its not unlikely that my testing is flawed but unfortunately I don't see
where right now (its 3am now and I have a 8h trainride behind me, so ...)
The simple testsetup I have till now:
Serverscript:
* setup disk
* start pg
* wait for getting killed
* setup disk
* start pg
Clientside:
* CREATE DATABASE ... TEMPLATE crashtemplate
* CHECKPOINT
* make device readonly not allowing any cache flushes or such (using
devicemapper)
kill server
* connect to database (some of the time it errors here
* select * from $every_table (some time here)
At first pg survived that nicely without any problems. Then I got to my senses
and started adding some background io. Like:
dd if=/dev/zero of=/mnt/test/foobar bs=10M count=1000
Thats where things started failing. All are logs from after the crash:
1:
FATAL: could not read relation mapping file "base/140883/pg_filenode.map":
Interrupted system call
DEBUG: autovacuum: processing database "postgres"
FATAL: could not read relation mapping file "base/140883/pg_filenode.map":
Success
DEBUG: autovacuum: processing database "postgres"
...
FATAL: could not read relation mapping file "base/58963/pg_filenode.map": No
such file or directory
2:
FATAL: "base/165459" is not a valid data directory
DETAIL: File "base/165459/PG_VERSION" does not contain valid data.
HINT: You might need to initdb.
3:
You are now connected to database "test".
test=# SELECT execute('SELECT * FROM table_'||g.i) FROM generate_series(1,
3000) g(i);
ERROR: XX001: could not read block 0 in file "base/124499/11652": read only 0
of 8192 bytes
LOCATION: mdread, md.c:656
(that one I did not see with -o data=ordered,barrier=1,commit=300)
I tried the following mount options/filesystems so far:
-t ext4 -o data=writeback,barrier=1,commit=300,noauto_da_alloc
-t ext4 -o data=writeback,barrier=1,commit=300
-t ext4 -o data=writeback,barrier=0,commit=300
-t ext4 -o data=ordered,barrier=0,commit=300,noauto_da_alloc
-t ext4 -o data=ordered,barrier=1,commit=300,noauto_da_alloc
-t ext4 -o data=ordered,barrier=1,commit=300
The same with s/ext4/ext3/ and with a commit=5. With the latter the errors
were way much harder to reproduce (not that surprisingly) but still occured.
I attached my preliminary scripts/hacks... They even contain a comment or two.
Note though that they are a bit of a loaded gun...
I guess it would be sensible trying to do some more extensive tests on a setup
like that... All I tested till now was create database :-(
Andres
Attachment | Content-Type | Size |
---|---|---|
pg_crashtest_client.sh | application/x-shellscript | 595 bytes |
pg_crashtest_server.sh | application/x-shellscript | 855 bytes |
pg_createtemplate.sh | application/x-shellscript | 281 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2010-02-20 02:33:53 | explain and PARAM_EXEC |
Previous Message | Tom Lane | 2010-02-20 00:52:06 | Re: Merge join and index scan strangeness |