Re: [ADMIN] ERROR invalid page header in block xxx of relation base/xxxxx/xxxxx/

From: Sreekanth Palluru <sree4pg(at)gmail(dot)com>
To: pgsql-admin(at)postgresql(dot)org, pgsql-general(at)postgresql(dot)org
Subject: Re: [ADMIN] ERROR invalid page header in block xxx of relation base/xxxxx/xxxxx/
Date: 2016-12-08 03:50:25
Message-ID: CAP+fnpiATfDQ-4ULpcUc=nf2ax7s_7oQQX2koBNnA9QkaJr2Xg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-general

Looping pgsql-general mail list

I see I am not clear in my question , below are the order of events we see
when we get a invalid page header in block corruption

-Windows server crashed/restarted due to power failure ( we believe) ( I
see that write-cache / write back cache / Disk cache are enabled and we
don't have battery backup )
-Posgres DB crashes (LOG: database system was not properly shut down;
automatic recovery in progress )
- After around 3hrs after crash we see below messages in log

2016-03-03 20:13:18 GMT ERROR: invalid page header in block 204 of
relation base/16413/16900
2016-03-03 20:13:18 GMT CONTEXT: automatic analyze of table "
a.parametertable" => this is a relation not Index

- select count(*) on table gives invalid page header message , where as
select count(*) order by primary key does not give any issue
- After clearing the blocks using zero_damaged_pages , total count of table
rows using the table and order by primary key matches which means there is
no dataloss ( I havent verified each record by record , I assume there is
no dataloss) please correct ??

I would like to know what was the cause of invalid page header and is there
any way I can reproduce this error ?

Appreciate your comments and suggestions on this ?

-Sreekanth

On Thu, Dec 8, 2016 at 12:55 PM, Sreekanth Palluru <sree4pg(at)gmail(dot)com>
wrote:

>
> Forgot to loop community
>
>
> Hi Jorge/All,
> Thanks for the reply.
> As per the documentation , I think backend reads the this page header and
> reports that it is damaged.
> I am looking at ways we re-create this scenario of creation this blank
> corrupted page ? do I have any control over Backend and after it
> initializes a new page using function PageInit
> <https://doxygen.postgresql.org/bufpage_8c.html#ab871202326b101c6ec24b7f628157c2c>
> (Page
> <https://doxygen.postgresql.org/bufpage_8h.html#a2010e3258a7075b32ad5750134ab9c5c>
> page, Size
> <https://doxygen.postgresql.org/c_8h.html#af9ecec2d692138fab9167164a457cbd4>
> pageSize, Size
> <https://doxygen.postgresql.org/c_8h.html#af9ecec2d692138fab9167164a457cbd4>
> specialSize) and I want to halt/crash the backend by stopping postgres
> database or through Hardware/system crash ?
> Is this possible ?
>
> Also I see that based on below comments from Source code, having such
> pages in database is normal . is my understanding correct ?
>
>
> /*
> 63 * PageIsVerified
> 64 * Check that the page header and checksum (if any) appear valid.
> 65 *
> 66 * This is called when a page has just been read in from disk. The
> idea is
> 67 * to cheaply detect trashed pages before we go nuts following bogus
> item
> 68 * pointers, testing invalid transaction identifiers, etc.
> 69 *
> 70 * It turns out to be necessary to allow zeroed pages here too. Even
> though
> 71 * this routine is *not* called when deliberately adding a page to a
> relation,
> 72 * there are scenarios in which a zeroed page might be found in a
> table.
> 73 * (Example: a backend extends a relation, then crashes before it can
> write
> 74 * any WAL entry about the new page. The kernel will already have the
> 75 * zeroed page in the file, and it will stay that way after restart.)
> So we
> 76 * allow zeroed pages here, and are careful that the page access macros
> 77 * treat such a page as empty and without free space. Eventually,
> VACUUM
> 78 * will clean up such a page and make it usable.
> 79 */
> zero_damaged_pages (boolean)
>
> Detection of a damaged page header normally causes PostgreSQL to report
> an error, aborting the current transaction. Setting zero_damaged_pages to
> on causes the system to instead report a warning, zero out the damaged page
> in memory, and continue processing. This behavior will destroy data,
> namely all the rows on the damaged page. However, it does allow you to get
> past the error and retrieve rows from any undamaged pages that might be
> present in the table. It is useful for recovering data if corruption has
> occurred due to a hardware or software error. You should generally not set
> this on until you have given up hope of recovering data from the damaged
> pages of a table. Zeroed-out pages are not forced to disk so it is
> recommended to recreate the table or the index before turning this
> parameter off again. The default setting is off, and it can only be
> changed by a superuser.
>
> On Thu, Dec 8, 2016 at 9:50 AM, Jorge Torralba <jorge(dot)torralba(at)gmail(dot)com>
> wrote:
>
>> Look into this setting.
>>
>> zero_damaged_pages = on;
>>
>> The docs should explain it.
>>
>>
>>
>> On Wed, Dec 7, 2016 at 2:47 PM, sreekanth Palluru <sree4pg(at)gmail(dot)com>
>> wrote:
>>
>>> Hi ,
>>>
>>> I am getting invalid page header error and what I could observe is when
>>> I select the table I get this error , where as if I select table and order
>>> by primary key I can retrieve the rows from table.
>>>
>>> And I don't see any dataloss ( based on total number records) after
>>> fixing the blocks using zero_damaged_pages=on and then vacuum full on the
>>> table
>>>
>>>
>>> Please note I have renamed few tables to avoid giving actual table names
>>>
>>> We run Postgres 9.2 version on Windows and Dell Optiplex XE2 server and
>>> we suspect that block corruption would have happen due to
>>> hardware/memory/power failures reasons and I have gone through wikik
>>> https://wiki.postgresql.org/wiki/Reliable_Writes.
>>>
>>> I want to understand why we can read the table through where there
>>> exists index and explain plan shows Index scan and with high cost compare
>>> to seq scan .
>>> I assume that since there no rows/data present in these corrupted blocks
>>> index scan skips these blocks and hence it is not throwing the error .
>>>
>>> Also , I want to know what would have caused the postgres to create
>>> these corrupted blocks and
>>> can I reproduce this error ? appreciate if you share any pointers to
>>> blogs/mailing lists if this type of issue is already discussed ?
>>>
>>>
>>> create table a.parametertable_bak as select * from a.parametertable
>>> order by id;
>>>
>>> labs=# select count(*) from a.parametertable_bak ;
>>> count
>>> -------
>>> 31415
>>> (1 row)
>>>
>>>
>>> labs=#
>>> labs=# checkpoint;
>>> CHECKPOINT
>>> labs=# set zero_damaged_pages=on;
>>> SET
>>> labs=# vacuum full a.parametertable;
>>> WARNING: invalid page header in block 204 of relation base/16413/16900;
>>> zeroing out page
>>> WARNING: invalid page header in block 205 of relation base/16413/16900;
>>> zeroing out page
>>> VACUUM
>>> labs=# select count(*) from a.parametertable ;
>>> count
>>> -------
>>> 31415
>>> (1 row)
>>>
>>>
>>> labs=#
>>> -Sreekanth
>>>
>>
>>
>>
>> --
>> Thanks,
>>
>> Jorge Torralba
>> ----------------------------
>>
>> Note: This communication may contain privileged or other confidential
>> information. If you are not the intended recipient, please do not print,
>> copy, retransmit, disseminate or otherwise use the information. Please
>> indicate to the sender that you have received this email in error and
>> delete the copy you received. Thank You.
>>
>
>
>
> --
> Regards
> Sreekanth
>
>
>
> --
> Regards
> Sreekanth
>

--
Regards
Sreekanth

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Peter Eisentraut 2016-12-08 13:55:20 Re: Are dns CNAME's allowed or useable in pg_hba.conf hostname specification
Previous Message Sreekanth Palluru 2016-12-08 01:55:39 Fwd: ERROR invalid page header in block xxx of relation base/xxxxx/xxxxx/

Browse pgsql-general by date

  From Date Subject
Next Message Charles Clavadetscher 2016-12-08 05:50:14 Re: When to use COMMENT vs --
Previous Message Metare Solve 2016-12-08 02:21:04 Re: Problems Importing table to pgadmin