Re: block corruption on slave db.

From: Keith <keith(at)keithf4(dot)com>
To: vinod kale <vinod16895(at)gmail(dot)com>
Cc: pgsql-admin <pgsql-admin(at)postgresql(dot)org>
Subject: Re: block corruption on slave db.
Date: 2020-06-16 04:54:20
Message-ID: CAHw75vuQ0zOPAp2iqjwg7O7qjY49R=SNqc=yorsYn0AE2nK-7w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

On Mon, Jun 15, 2020 at 9:45 AM vinod kale <vinod16895(at)gmail(dot)com> wrote:

> Hello Team,
>
> I have to clear some doughts regarding block corruption on slave db.
>
> If m having streaming replication in my environment. And in some moments
> corruption happened in my slave server's data then how will i come to know
> the corruption occurred.
>
> Bcoz if same happened on primary db then i wll immediately come to know
> bcoz there is daily dump backup we take on primary and in dump block
> corruption detects.
>
> Bt my dought is how we can identify on slave db.
>
> Kindly help me in above.
>
> Thanks in advance.
>
> --
> Thanks & regards,
> Vinod Kale
>

There's many different kinds of corruption, but one fairly effective way
you can watch for some of it is by ensuring that page level checksums are
enabled. If you're on a version of PG prior to 12, this can only be done
during database initialization (when you run initdb), so if it wasn't
enabled before, you'd have to dump and restore your database into a newly
initialized instance.

If you are on PG12+, or you can upgrade to it, you can use the pg_checksum
tool to enable page level checksums on an offline database

https://www.postgresql.org/docs/12/app-pgchecksums.html

Also on PG12+, you have the ability to monitor if a page level checksum is
encountered from within the database by querying the pg_stat_database
catalog. This is useful for monitoring tools such as check_postgres(
https://bucardo.org/check_postgres/) or pgMonitor (
https://github.com/CrunchyData/pgmonitor) An example query from pgMonitor
to pull out the count and time since the last failure is as follows.

SELECT datname AS dbname
, checksum_failures AS count
, coalesce(extract(epoch from (now() - checksum_last_failure)), 0) AS
time_since_last_failure_seconds
FROM pg_catalog.pg_stat_database;

Note that if statistics are reset, the errors detected and recorded here
are also reset. That is also how you'd reset it if you encountered
corruption and fixed it. Prior to PG12, page level checksum errors were
only reported via the logs.

Keith

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Michaeldba@sqlexec.com 2020-06-16 10:48:43 Re: Deleting more efficiently from large partitions
Previous Message Wells Oliver 2020-06-16 01:39:02 Deleting more efficiently from large partitions