Re: help troubleshooting invalid page header error

From: chiru r <chirupg(at)gmail(dot)com>
To: Cory Zue <czue(at)dimagi(dot)com>
Cc: Forums postgresql <pgsql-general(at)postgresql(dot)org>
Subject: Re: help troubleshooting invalid page header error
Date: 2014-12-25 18:06:44
Message-ID: CA+RSxMhvDP7w_v80nXidCO2gMsN1wHv9vWx1wRtmX1Piv9MRqw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Cory,

We have *zero_damaged_pages* parameter in PostgreSQL configuration,by
default it is set be *off*.
To recover data from corrupted table,we can turn *on* this parameter as a
super user and populate new table using dump or copy utility.

Note : The damaged pages we can't recover from table,it will set to 0 and
it will skip while fetching data from table.

Please follow below steps, if decided to recover data from corrupted table.

*Sample case :*

[postgres(at)instructor ~]$ /usr/local/pgsql/bin/psql
psql (9.4rc1)
Type "help" for help.

postgres=# select count(*) from test;
*ERROR: invalid page in block 7 of relation base/13003/16384*
postgres=# show zero_damaged_pages;
zero_damaged_pages
--------------------
off
(1 row)

postgres=# *set zero_damaged_pages=on;*
SET
postgres=# show zero_damaged_pages;
zero_damaged_pages
--------------------
on
(1 row)

postgres=# select count(*) from test;
*WARNING: invalid page in block 7 of relation base/13003/16384; zeroing
out page*
WARNING: invalid page in block 8 of relation base/13003/16384; zeroing out
page
WARNING: invalid page in block 9 of relation base/13003/16384; zeroing out
page
WARNING: invalid page in block 10 of relation base/13003/16384; zeroing
out page
WARNING: invalid page in block 11 of relation base/13003/16384; zeroing
out page
WARNING: invalid page in block 12 of relation base/13003/16384; zeroing
out page
WARNING: invalid page in block 13 of relation base/13003/16384; zeroing
out page

count
--------
979163
(1 row)

On Tue, Dec 23, 2014 at 8:47 AM, Cory Zue <czue(at)dimagi(dot)com> wrote:

> Hi all,
>
> Our postgres instance on one of our production machines has recently been
> returning errors of the form "DatabaseError: invalid page header in block
> 1 of relation base/16384/76623" from normal queries. I've been reading that
> these are often linked to hardware errors, but I would like to better
> understand what else it could be or how to determine that for sure. I've
> filled out the standard issue reporting template below. Any feedback or
> troubleshooting instructions would be much appreciated.
>
> ---
> A description of what you are trying to achieve and what results you
> expect.:
>
> Intermittent queries are failing with the error "DatabaseError: invalid
> page header in block 1 of relation base/16384/76623"
>
> PostgreSQL version number you are running:
>
> PostgreSQL 8.4.13 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC)
> 4.4.6 20120305 (Red Hat 4.4.6-4), 64-bit
>
> How you installed PostgreSQL:
>
> from standard package installer
>
> Changes made to the settings in the postgresql.conf file:
>
>
> name | current_setting |
> source
>
> ------------------------------+-----------------------------+----------------------
> checkpoint_completion_target | 0.9 |
> configuration file
> checkpoint_segments | 32 |
> configuration file
> checkpoint_timeout | 15min |
> configuration file
> DateStyle | ISO, MDY |
> configuration file
> default_text_search_config | pg_catalog.english |
> configuration file
> effective_cache_size | 1GB |
> configuration file
> lc_messages | en_US.UTF-8 |
> configuration file
> lc_monetary | en_US.UTF-8 |
> configuration file
> lc_numeric | en_US.UTF-8 |
> configuration file
> lc_time | en_US.UTF-8 |
> configuration file
> log_checkpoints | on |
> configuration file
> log_connections | off |
> configuration file
> log_destination | csvlog |
> configuration file
> log_directory | /opt/data/pgsql/data/pg_log |
> configuration file
> log_disconnections | off |
> configuration file
> log_duration | on |
> configuration file
> log_filename | postgres-%Y-%m-%d_%H%M%S |
> configuration file
> log_lock_waits | on |
> configuration file
> log_min_duration_statement | 250ms |
> configuration file
> log_rotation_age | 1d |
> configuration file
> log_rotation_size | 1GB |
> configuration file
> log_temp_files | 0 |
> configuration file
> log_timezone | Asia/Kolkata | command line
> log_truncate_on_rotation | on |
> configuration file
> logging_collector | on |
> configuration file
> maintenance_work_mem | 768MB |
> configuration file
> max_connections | 500 |
> configuration file
> max_stack_depth | 2MB | environment
> variable
> port | 5432 | command line
> shared_buffers | 4GB |
> configuration file
> ssl | on |
> configuration file
> TimeZone | Asia/Kolkata | command line
> timezone_abbreviations | Default | command line
> wal_buffers | 16MB |
> configuration file
> work_mem | 48MB |
> configuration file
>
> It's also probably worth noting that postgres is installed on an encrypted
> volume which is mounted using ecryptfs.
>
> Operating system and version:
>
> RedHatEnterpriseServer, version 6.6
>
> What program you're using to connect to PostgreSQL:
>
> Python (django)
>
> Is there anything relevant or unusual in the PostgreSQL server logs?:
>
> I see lots of instances of this error (and similar). I'm not sure what
> else I should be looking for.
>
> What you were doing when the error happened / how to cause the error:
>
> I haven't explicitly tried to reproduce it, but it seems to consistently
> happen with certain queries. However, the system was rebooted shortly
> before the errors started occuring. The system was rebooted because another
> database (elasticsearch) was having problems on the same machine and the
> reboot was to attempt to resolve things.
>
> The EXACT TEXT of the error message you're getting, if there is one:
>
> DatabaseError: invalid page header in block 1 of relation base/16384/76623
>
> (block and relation numbers change)
>
> Unfortunately, I'm not completely familiar with the CPU and disk/RAID
> configurations used on the server. However it is storing to a (software)
> encrypted volume as mentioned above.
>
>
> - Have you *ever* set fsync=off in the postgresql config file?
>
> No
>
> - Have you had any unexpected power loss lately? Replaced a failed
> RAID disk? Had an operating system crash?
>
> Not recently, though the system did reboot normally as described above.
>
> - Have you run a file system check? (chkdsk / fsck)
>
> No.
>
> - Are there any error messages in the system logs? (unix/linux: dmesg,
> /var/log/syslog ;
>
> I haven't seen anything obvious but I wasn't sure what to look for.
>
> thanks,
> Cory
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Andreas Ulbrich 2014-12-25 19:21:21 Re: Check constraint on foreign table using SQL function
Previous Message Adrian Klaver 2014-12-25 17:52:29 Re: Check constraint on foreign table using SQL function