Re: Fixing or Mitigating this ERROR: invalid page in block 35217 of relation base/16421/3192429

From: Abdul Qoyyuum <aqoyyuum(at)cardaccess(dot)com(dot)bn>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Fixing or Mitigating this ERROR: invalid page in block 35217 of relation base/16421/3192429
Date: 2023-11-30 01:50:47
Message-ID: CAA3DN=WNOcwUgBQdGC6n99t6P25cDJQ9=W=SfOc3fdE9fWnHsg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Stephen,

On Wed, Nov 29, 2023 at 5:53 PM Stephen Frost <sfrost(at)snowman(dot)net> wrote:

> Greetings,
>
> * Abdul Qoyyuum (aqoyyuum(at)cardaccess(dot)com(dot)bn) wrote:
> > Knowing that it's a data corruption issue, the only way to fix this is to
> > vacuum and reindex the database. What was suggested was the following:
> >
> > SET zero_damaged_pages = 0; # This is so that we can have the application
> > to continue to run
> > VACUUM FULL VERBOSE ANALYSE; # Do a full vacuum and analyse the problem
> if
> > possible.
> > REINDEX DATABASE "core"; # Then do a reindex and clean it up.
>
> This is only going to help if the issue is in an index, which isn't
> clear from what's been shared.
>

That is a good point. Problem is I can't really find out as the logs isn't
that verbose to tell me more. Part of the logs shows something like this:

2023-11-29 04:27:17.486 [ERROR] [dispatcher-1095] [<redacted>] - ERROR:
invalid page in block 35217 of relation base/16421/3192429
Where: parallel worker
org.postgresql.util.PSQLException: ERROR: invalid page in block 35217 of
relation base/16421/3192429
Where: parallel worker
at
org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2565)
~[postgresql-42.2.24.jre7.jar:42.2.24.jre7]
at
org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2297)
~[postgresql-42.2.24.jre7.jar:42.2.24.jre7]

>
> > We're on Postgresql 12. This has worked before it happened (almost
> exactly
> > a year ago) and I think this needs a more permanent solution. I've looked
> > at routine vacuuming and checked the autovacuum is set to on and the
> > following configurations:
>
> This isn't something that should ever happen ...
>
> This also doesn't have anything to do with autovacuum, changing settings
> there won't make any difference.
>

Noted but it has been a clean running since a year ago that I ran the
vacuum and reindex commands.

>
> > Can anyone advise if there's anything else we can do? We have no clue
> what
> > causes the invalid page block and we are running a High Availability
> > cluster set up but we are hoping that there may be a way to mitigate it.
>
> Was there some kind of hardware fault? Did you do a failover? Restore
> from a backup? Do you have checksums enabled? How many times has this
> happened before, and how many pages were impacted? What is the design
> of your HA solution, are you using PG replication or something else?

There have been a few maintenance operations earlier this year but nothing
too critical or anything failed that would have caused the database to go
corrupt. The HA solution we're using is the pacemaker with the active
passive setup.

Unsure if sharing the relevant WAL settings from postgresql.conf may be
useful but here they are:

max_connections = 300
shared_buffers = 128MB

archive_mode = on
archive_command = 'test ! -f /opt/databases/postgres12/wal_archive/%f && cp
%p /opt/databases/postgres12/wal_archive/%f'
hot_standby = on
wal_level = hot_standby
full_page_writes = on
max_wal_senders = 10
wal_keep_segments = 100 # 16MB per segment = 1.6GB
hot_standby = on
restart_after_crash = off
wal_receiver_status_interval = 2 # seconds
max_standby_streaming_delay = -1
max_standby_archive_delay = -1
synchronous_commit = on
hot_standby_feedback = on
wal_sender_timeout = 10000
wal_receiver_timeout = 10000

Thanks,
>
> Stephen
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Abdul Qoyyuum 2023-11-30 02:03:19 Re: Fixing or Mitigating this ERROR: invalid page in block 35217 of relation base/16421/3192429
Previous Message Kyotaro Horiguchi 2023-11-30 00:39:13 Re: Could not read from file "pg_subtrans/00F5" at offset 122880: Success.