help troubleshooting invalid page header error

From: Cory Zue <czue(at)dimagi(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: help troubleshooting invalid page header error
Date: 2014-12-23 03:17:28
Message-ID: CAF+m3tY==Es47vD03Lmv+ymtcaixQwJPaCm-Z_b4SfBZ-RiSjw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi all,

Our postgres instance on one of our production machines has recently been
returning errors of the form "DatabaseError: invalid page header in block 1
of relation base/16384/76623" from normal queries. I've been reading that
these are often linked to hardware errors, but I would like to better
understand what else it could be or how to determine that for sure. I've
filled out the standard issue reporting template below. Any feedback or
troubleshooting instructions would be much appreciated.

---
A description of what you are trying to achieve and what results you
expect.:

Intermittent queries are failing with the error "DatabaseError: invalid
page header in block 1 of relation base/16384/76623"

PostgreSQL version number you are running:

PostgreSQL 8.4.13 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC)
4.4.6 20120305 (Red Hat 4.4.6-4), 64-bit

How you installed PostgreSQL:

from standard package installer

Changes made to the settings in the postgresql.conf file:

name | current_setting | source

------------------------------+-----------------------------+----------------------
checkpoint_completion_target | 0.9 | configuration
file
checkpoint_segments | 32 | configuration
file
checkpoint_timeout | 15min | configuration
file
DateStyle | ISO, MDY | configuration
file
default_text_search_config | pg_catalog.english | configuration
file
effective_cache_size | 1GB | configuration
file
lc_messages | en_US.UTF-8 | configuration
file
lc_monetary | en_US.UTF-8 | configuration
file
lc_numeric | en_US.UTF-8 | configuration
file
lc_time | en_US.UTF-8 | configuration
file
log_checkpoints | on | configuration
file
log_connections | off | configuration
file
log_destination | csvlog | configuration
file
log_directory | /opt/data/pgsql/data/pg_log | configuration
file
log_disconnections | off | configuration
file
log_duration | on | configuration
file
log_filename | postgres-%Y-%m-%d_%H%M%S | configuration
file
log_lock_waits | on | configuration
file
log_min_duration_statement | 250ms | configuration
file
log_rotation_age | 1d | configuration
file
log_rotation_size | 1GB | configuration
file
log_temp_files | 0 | configuration
file
log_timezone | Asia/Kolkata | command line
log_truncate_on_rotation | on | configuration
file
logging_collector | on | configuration
file
maintenance_work_mem | 768MB | configuration
file
max_connections | 500 | configuration
file
max_stack_depth | 2MB | environment
variable
port | 5432 | command line
shared_buffers | 4GB | configuration
file
ssl | on | configuration
file
TimeZone | Asia/Kolkata | command line
timezone_abbreviations | Default | command line
wal_buffers | 16MB | configuration
file
work_mem | 48MB | configuration
file

It's also probably worth noting that postgres is installed on an encrypted
volume which is mounted using ecryptfs.

Operating system and version:

RedHatEnterpriseServer, version 6.6

What program you're using to connect to PostgreSQL:

Python (django)

Is there anything relevant or unusual in the PostgreSQL server logs?:

I see lots of instances of this error (and similar). I'm not sure what else
I should be looking for.

What you were doing when the error happened / how to cause the error:

I haven't explicitly tried to reproduce it, but it seems to consistently
happen with certain queries. However, the system was rebooted shortly
before the errors started occuring. The system was rebooted because another
database (elasticsearch) was having problems on the same machine and the
reboot was to attempt to resolve things.

The EXACT TEXT of the error message you're getting, if there is one:

DatabaseError: invalid page header in block 1 of relation base/16384/76623

(block and relation numbers change)

Unfortunately, I'm not completely familiar with the CPU and disk/RAID
configurations used on the server. However it is storing to a (software)
encrypted volume as mentioned above.

- Have you *ever* set fsync=off in the postgresql config file?

No

- Have you had any unexpected power loss lately? Replaced a failed RAID
disk? Had an operating system crash?

Not recently, though the system did reboot normally as described above.

- Have you run a file system check? (chkdsk / fsck)

No.

- Are there any error messages in the system logs? (unix/linux: dmesg,
/var/log/syslog ;

I haven't seen anything obvious but I wasn't sure what to look for.

thanks,
Cory

Responses

Browse pgsql-general by date

  From Date Subject
Next Message stefano bonnin 2014-12-23 07:11:36 Re: BDR conpilation error un Ubuntu 12.4 LTS
Previous Message David Johnston 2014-12-23 02:55:56 psql feature request: --list-conninfo (dump the config info psql would use to connect)