Re: help troubleshooting invalid page header error

From: Cory Zue <czue(at)dimagi(dot)com>
To: chiru r <chirupg(at)gmail(dot)com>
Cc: Sameer Kumar <sameer(dot)kumar(at)ashnik(dot)com>, PostgreSQL General Discussion Forum <pgsql-general(at)postgresql(dot)org>
Subject: Re: help troubleshooting invalid page header error
Date: 2014-12-26 22:20:24
Message-ID: CAF+m3tbK1HnE-WyG62hEnpW3kmf=OhEx8Nz8kHUoND-8DHn+GQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

(nevermind - it looks like the zero_damaged_pages setting only took for the
duration of the session)

On Fri, Dec 26, 2014 at 5:15 PM, Cory Zue <czue(at)dimagi(dot)com> wrote:

> Hi Chiru,
>
> I am trying to pg_dump the database to have a snapshot of the current
> state. I've turned on 'zero_damaged_pages' but pg_dump is still failing
> with an "invalid page header" error - this time from what looks like a
> sequence object that is auto-setting IDs on a table. Any advice on how to
> remove this error?
>
> Here is the full query that's failing:
>
> SELECT sequence_name, start_value, last_value, increment_by, CASE WHEN
> increment_by > 0 AND max_value = 9223372036854775807 THEN NULL WHEN
> increment_by < 0 AND max_value = -1 THEN NULL ELSE max_value END AS
> max_value, CASE WHEN increment_by > 0 AND min_value = 1 THEN NULL WHEN
> increment_by < 0 AND min_value = -9223372036854775807 THEN NULL ELSE
> min_value END AS min_value, cache_value, is_cycled, is_called from
> unfinishedsubmissionstub_id_seq
>
> On Fri, Dec 26, 2014 at 2:35 PM, chiru r <chirupg(at)gmail(dot)com> wrote:
>
>> Hi Cory,
>>
>> After recovering table turn off *zero_damaged_pages *parameter.
>>
>>
>> On Fri, Dec 26, 2014 at 9:13 PM, Cory Zue <czue(at)dimagi(dot)com> wrote:
>>
>>> Hi all,
>>>
>>> Thanks for the responses. Chiru, I'm looking into your suggestion.
>>>
>>> Sameer, here is the kernel version info:
>>>
>>> Linux dimagi 2.6.32-431.20.5.el6.x86_64 #1 SMP Wed Jul 16 05:26:53 EDT
>>> 2014 x86_64 x86_64 x86_64 GNU/Linux
>>>
>>> Does that seem like it could be a problematic version?
>>>
>>> More generally - I'm still wondering whether I should chalk this failure
>>> up to a transient/random issue, or whether I should be more worried about
>>> the hardware on the machine. According to our diagnostic tools, disk and
>>> memory are fine, but it's still not clear to me how it got into this state.
>>> Any general bits of information regarding the potential causes of these
>>> types of issues would be much appreciated.
>>>
>>> thanks,
>>> Cory
>>>
>>>
>>> On Fri, Dec 26, 2014 at 6:55 AM, Sameer Kumar <sameer(dot)kumar(at)ashnik(dot)com>
>>> wrote:
>>>
>>>> On 23 Dec 2014 12:05, "Cory Zue" <czue(at)dimagi(dot)com> wrote:
>>>> >
>>>> > Hi all,
>>>> >
>>>> > Our postgres instance on one of our production machines has recently
>>>> been returning errors of the form "DatabaseError: invalid page header in
>>>> block 1 of relation base/16384/76623" from normal queries. I've been
>>>> reading that these are often linked to hardware errors, but I would like to
>>>> better understand what else it could be or how to determine that for sure.
>>>> I've filled out the standard issue reporting template below. Any feedback
>>>> or troubleshooting instructions would be much appreciated.
>>>> >
>>>> > ---
>>>> > A description of what you are trying to achieve and what results you
>>>> expect.:
>>>> >
>>>> > Intermittent queries are failing with the error "DatabaseError:
>>>> invalid page header in block 1 of relation base/16384/76623"
>>>> >
>>>> > PostgreSQL version number you are running:
>>>> >
>>>> > PostgreSQL 8.4.13 on x86_64-redhat-linux-gnu, compiled by GCC gcc
>>>> (GCC) 4.4.6 20120305 (Red Hat 4.4.6-4), 64-bit
>>>> >
>>>> > How you installed PostgreSQL:
>>>> >
>>>> > from standard package installer
>>>> >
>>>> > Changes made to the settings in the postgresql.conf file:
>>>> >
>>>> >
>>>> > name | current_setting |
>>>> source
>>>> >
>>>> ------------------------------+-----------------------------+----------------------
>>>> > checkpoint_completion_target | 0.9 |
>>>> configuration file
>>>> > checkpoint_segments | 32 |
>>>> configuration file
>>>> > checkpoint_timeout | 15min |
>>>> configuration file
>>>> > DateStyle | ISO, MDY |
>>>> configuration file
>>>> > default_text_search_config | pg_catalog.english |
>>>> configuration file
>>>> > effective_cache_size | 1GB |
>>>> configuration file
>>>> > lc_messages | en_US.UTF-8 |
>>>> configuration file
>>>> > lc_monetary | en_US.UTF-8 |
>>>> configuration file
>>>> > lc_numeric | en_US.UTF-8 |
>>>> configuration file
>>>> > lc_time | en_US.UTF-8 |
>>>> configuration file
>>>> > log_checkpoints | on |
>>>> configuration file
>>>> > log_connections | off |
>>>> configuration file
>>>> > log_destination | csvlog |
>>>> configuration file
>>>> > log_directory | /opt/data/pgsql/data/pg_log |
>>>> configuration file
>>>> > log_disconnections | off |
>>>> configuration file
>>>> > log_duration | on |
>>>> configuration file
>>>> > log_filename | postgres-%Y-%m-%d_%H%M%S |
>>>> configuration file
>>>> > log_lock_waits | on |
>>>> configuration file
>>>> > log_min_duration_statement | 250ms |
>>>> configuration file
>>>> > log_rotation_age | 1d |
>>>> configuration file
>>>> > log_rotation_size | 1GB |
>>>> configuration file
>>>> > log_temp_files | 0 |
>>>> configuration file
>>>> > log_timezone | Asia/Kolkata | command
>>>> line
>>>> > log_truncate_on_rotation | on |
>>>> configuration file
>>>> > logging_collector | on |
>>>> configuration file
>>>> > maintenance_work_mem | 768MB |
>>>> configuration file
>>>> > max_connections | 500 |
>>>> configuration file
>>>> > max_stack_depth | 2MB |
>>>> environment variable
>>>> > port | 5432 | command
>>>> line
>>>> > shared_buffers | 4GB |
>>>> configuration file
>>>> > ssl | on |
>>>> configuration file
>>>> > TimeZone | Asia/Kolkata | command
>>>> line
>>>> > timezone_abbreviations | Default | command
>>>> line
>>>> > wal_buffers | 16MB |
>>>> configuration file
>>>> > work_mem | 48MB |
>>>> configuration file
>>>> >
>>>> > It's also probably worth noting that postgres is installed on an
>>>> encrypted volume which is mounted using ecryptfs.
>>>> >
>>>> > Operating system and version:
>>>> >
>>>> > RedHatEnterpriseServer, version 6.6
>>>> >
>>>> > What program you're using to connect to PostgreSQL:
>>>> >
>>>> > Python (django)
>>>> >
>>>> > Is there anything relevant or unusual in the PostgreSQL server logs?:
>>>> >
>>>> > I see lots of instances of this error (and similar). I'm not sure
>>>> what else I should be looking for.
>>>> >
>>>> > What you were doing when the error happened / how to cause the error:
>>>> >
>>>> > I haven't explicitly tried to reproduce it, but it seems to
>>>> consistently happen with certain queries. However, the system was rebooted
>>>> shortly before the errors started occuring. The system was rebooted because
>>>> another database (elasticsearch) was having problems on the same machine
>>>> and the reboot was to attempt to resolve things.
>>>> >
>>>> > The EXACT TEXT of the error message you're getting, if there is one:
>>>> >
>>>> > DatabaseError: invalid page header in block 1 of relation
>>>> base/16384/76623
>>>> >
>>>> > (block and relation numbers change)
>>>> >
>>>> > Unfortunately, I'm not completely familiar with the CPU and disk/RAID
>>>> configurations used on the server. However it is storing to a (software)
>>>> encrypted volume as mentioned above.
>>>> >
>>>> > Have you ever set fsync=off in the postgresql config file?
>>>> > No
>>>> > Have you had any unexpected power loss lately? Replaced a failed RAID
>>>> disk? Had an operating system crash?
>>>> > Not recently, though the system did reboot normally as described
>>>> above.
>>>> > Have you run a file system check? (chkdsk / fsck)
>>>> > No.
>>>> > Are there any error messages in the system logs?
>>>> (unix/linux: dmesg, /var/log/syslog ;
>>>> > I haven't seen anything obvious but I wasn't sure what to look for.
>>>> >
>>>>
>>>> I guess you missed to provide the details and kernel version (rhel
>>>> version and kernel level).
>>>> This will give you kernel patch level-
>>>>
>>>> uname -a
>>>>
>>>> I had once faced this issue and I was on a buggy patch of Linux kernel.
>>>> I just had to update to latest patch. That worked for me.
>>>>
>>>
>>>
>>
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2014-12-27 00:13:50 Re: How to install pgAdmin 1.20 on Mint Rebecca?
Previous Message Edson Carlos Ericksson Richter 2014-12-26 22:16:45 Re: How to install pgAdmin 1.20 on Mint Rebecca?