From: | Jeff Davis <pgsql(at)j-davis(dot)com> |
---|---|
To: | pgsql-bugs(at)postgresql(dot)org |
Subject: | page corruption after moving tablespace |
Date: | 2010-07-23 06:50:43 |
Message-ID: | 1279867843.23350.28.camel@jdavis |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
I was investigating some strange page corruption today in which the page
was completely zeroed except for the LSN and TLI.
I found a sequence that can cause that problem even in 9.0:
(wal_level must be set to "archive" or greater)
1. Create a tablespace "t1"
2. Create a table "foo"
3. Attach to the backend with gdb, and set a breakpoint at the
START_CRITICAL_SECTION() line in heap_insert(). Continue in gdb.
4. Insert a tuple into foo.
5. gdb should break. At that time, send a SIGKILL.
6. restart the server (if it doesn't restart itself)
7. ALTER TABLE foo SET TABLESPACE t1;
8. SELECT * FROM foo;
ERROR: invalid page header in block 0 of relation
pg_tblspc/16384/PG_9.1_201007151/11876/24576
The SIGKILL is just a way to get an all-zero page to end up in a heap
file. Any time any relation gets an all-zero page (which is generally
treated as a valid situation in postgres), changing the tablespace is a
problem. The code does a copy_relation_data, and that does a
log_newpage, and that sets the LSN and TLI on the page and then writes
it. But on an all-zero page, that leaves the page corrupt.
I think the simple fix would be to have copy_relation_data call
PageInit() if it's a new page. Are there other areas where a similar
problem might exist?
Regards,
Jeff Davis
From | Date | Subject | |
---|---|---|---|
Next Message | depstein | 2010-07-23 12:29:49 | Re: pg_upgrade issues |
Previous Message | Robert Haas | 2010-07-23 04:40:58 | Re: BUG #5562: icon "terrestrial globe" much too big |