Re: help troubleshooting invalid page header error

From: chiru r <chirupg(at)gmail(dot)com>
To: Cory Zue <czue(at)dimagi(dot)com>
Cc: Sameer Kumar <sameer(dot)kumar(at)ashnik(dot)com>, PostgreSQL General Discussion Forum <pgsql-general(at)postgresql(dot)org>
Subject: Re: help troubleshooting invalid page header error
Date: 2014-12-26 19:35:53
Message-ID: CA+RSxMj1b9LKO1432-MvqZR9EmngwugYUUMM_vwMGZLz-5s5Eg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Cory,

After recovering table turn off *zero_damaged_pages *parameter.

On Fri, Dec 26, 2014 at 9:13 PM, Cory Zue <czue(at)dimagi(dot)com> wrote:

> Hi all,
>
> Thanks for the responses. Chiru, I'm looking into your suggestion.
>
> Sameer, here is the kernel version info:
>
> Linux dimagi 2.6.32-431.20.5.el6.x86_64 #1 SMP Wed Jul 16 05:26:53 EDT
> 2014 x86_64 x86_64 x86_64 GNU/Linux
>
> Does that seem like it could be a problematic version?
>
> More generally - I'm still wondering whether I should chalk this failure
> up to a transient/random issue, or whether I should be more worried about
> the hardware on the machine. According to our diagnostic tools, disk and
> memory are fine, but it's still not clear to me how it got into this state.
> Any general bits of information regarding the potential causes of these
> types of issues would be much appreciated.
>
> thanks,
> Cory
>
>
> On Fri, Dec 26, 2014 at 6:55 AM, Sameer Kumar <sameer(dot)kumar(at)ashnik(dot)com>
> wrote:
>
>> On 23 Dec 2014 12:05, "Cory Zue" <czue(at)dimagi(dot)com> wrote:
>> >
>> > Hi all,
>> >
>> > Our postgres instance on one of our production machines has recently
>> been returning errors of the form "DatabaseError: invalid page header in
>> block 1 of relation base/16384/76623" from normal queries. I've been
>> reading that these are often linked to hardware errors, but I would like to
>> better understand what else it could be or how to determine that for sure.
>> I've filled out the standard issue reporting template below. Any feedback
>> or troubleshooting instructions would be much appreciated.
>> >
>> > ---
>> > A description of what you are trying to achieve and what results you
>> expect.:
>> >
>> > Intermittent queries are failing with the error "DatabaseError: invalid
>> page header in block 1 of relation base/16384/76623"
>> >
>> > PostgreSQL version number you are running:
>> >
>> > PostgreSQL 8.4.13 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC)
>> 4.4.6 20120305 (Red Hat 4.4.6-4), 64-bit
>> >
>> > How you installed PostgreSQL:
>> >
>> > from standard package installer
>> >
>> > Changes made to the settings in the postgresql.conf file:
>> >
>> >
>> > name | current_setting |
>> source
>> >
>> ------------------------------+-----------------------------+----------------------
>> > checkpoint_completion_target | 0.9 |
>> configuration file
>> > checkpoint_segments | 32 |
>> configuration file
>> > checkpoint_timeout | 15min |
>> configuration file
>> > DateStyle | ISO, MDY |
>> configuration file
>> > default_text_search_config | pg_catalog.english |
>> configuration file
>> > effective_cache_size | 1GB |
>> configuration file
>> > lc_messages | en_US.UTF-8 |
>> configuration file
>> > lc_monetary | en_US.UTF-8 |
>> configuration file
>> > lc_numeric | en_US.UTF-8 |
>> configuration file
>> > lc_time | en_US.UTF-8 |
>> configuration file
>> > log_checkpoints | on |
>> configuration file
>> > log_connections | off |
>> configuration file
>> > log_destination | csvlog |
>> configuration file
>> > log_directory | /opt/data/pgsql/data/pg_log |
>> configuration file
>> > log_disconnections | off |
>> configuration file
>> > log_duration | on |
>> configuration file
>> > log_filename | postgres-%Y-%m-%d_%H%M%S |
>> configuration file
>> > log_lock_waits | on |
>> configuration file
>> > log_min_duration_statement | 250ms |
>> configuration file
>> > log_rotation_age | 1d |
>> configuration file
>> > log_rotation_size | 1GB |
>> configuration file
>> > log_temp_files | 0 |
>> configuration file
>> > log_timezone | Asia/Kolkata | command
>> line
>> > log_truncate_on_rotation | on |
>> configuration file
>> > logging_collector | on |
>> configuration file
>> > maintenance_work_mem | 768MB |
>> configuration file
>> > max_connections | 500 |
>> configuration file
>> > max_stack_depth | 2MB |
>> environment variable
>> > port | 5432 | command
>> line
>> > shared_buffers | 4GB |
>> configuration file
>> > ssl | on |
>> configuration file
>> > TimeZone | Asia/Kolkata | command
>> line
>> > timezone_abbreviations | Default | command
>> line
>> > wal_buffers | 16MB |
>> configuration file
>> > work_mem | 48MB |
>> configuration file
>> >
>> > It's also probably worth noting that postgres is installed on an
>> encrypted volume which is mounted using ecryptfs.
>> >
>> > Operating system and version:
>> >
>> > RedHatEnterpriseServer, version 6.6
>> >
>> > What program you're using to connect to PostgreSQL:
>> >
>> > Python (django)
>> >
>> > Is there anything relevant or unusual in the PostgreSQL server logs?:
>> >
>> > I see lots of instances of this error (and similar). I'm not sure what
>> else I should be looking for.
>> >
>> > What you were doing when the error happened / how to cause the error:
>> >
>> > I haven't explicitly tried to reproduce it, but it seems to
>> consistently happen with certain queries. However, the system was rebooted
>> shortly before the errors started occuring. The system was rebooted because
>> another database (elasticsearch) was having problems on the same machine
>> and the reboot was to attempt to resolve things.
>> >
>> > The EXACT TEXT of the error message you're getting, if there is one:
>> >
>> > DatabaseError: invalid page header in block 1 of relation
>> base/16384/76623
>> >
>> > (block and relation numbers change)
>> >
>> > Unfortunately, I'm not completely familiar with the CPU and disk/RAID
>> configurations used on the server. However it is storing to a (software)
>> encrypted volume as mentioned above.
>> >
>> > Have you ever set fsync=off in the postgresql config file?
>> > No
>> > Have you had any unexpected power loss lately? Replaced a failed RAID
>> disk? Had an operating system crash?
>> > Not recently, though the system did reboot normally as described above.
>> > Have you run a file system check? (chkdsk / fsck)
>> > No.
>> > Are there any error messages in the system logs?
>> (unix/linux: dmesg, /var/log/syslog ;
>> > I haven't seen anything obvious but I wasn't sure what to look for.
>> >
>>
>> I guess you missed to provide the details and kernel version (rhel
>> version and kernel level).
>> This will give you kernel patch level-
>>
>> uname -a
>>
>> I had once faced this issue and I was on a buggy patch of Linux kernel. I
>> just had to update to latest patch. That worked for me.
>>
>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Bruce Momjian 2014-12-26 21:07:11 Re: Checksums and full_page_writes
Previous Message Dan S 2014-12-26 19:19:47 Re: question about window function in C