Re: pg_stat_database.checksum_failures vs shared relations

From: Julien Rouhaud <rjuju123(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Magnus Hagander <magnus(at)hagander(dot)net>
Subject: Re: pg_stat_database.checksum_failures vs shared relations
Date: 2025-03-28 03:24:14
Message-ID: Z-YWXul2kEck6UYH@jrouhaud
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 27, 2025 at 09:02:02PM -0400, Andres Freund wrote:
> Hi,
>
> On 2025-03-28 09:44:58 +0900, Michael Paquier wrote:
> > On Thu, Mar 27, 2025 at 12:06:45PM -0400, Robert Haas wrote:
> > > On Thu, Mar 27, 2025 at 11:58 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > > > So, today we have the weird situation that *some* checksum errors on shared
> > > > relations get attributed to the current database (if they happen in a backend
> > > > normally accessing a shared relation), whereas others get reported to the
> > > > "shared relations" "database" (if they happen during a base backup). That
> > > > seems ... not optimal.
> > > >
> > > > One question is whether we consider this a bug that should be backpatched.
> > >
> > > I think it would be defensible if pg_basebackup reported all errors
> > > with OID 0 and backend connections reported all errors with OID
> > > MyDatabaseId, but it seems hard to justify having pg_basebackup take
> > > care to report things using the correct database OID and individual
> > > backend connections not take care to do the same thing. So I think
> > > this is a bug. If fixing it in the back-branches is too annoying, I
> > > think it would be reasonable to fix it only in master, but
> > > back-patching seems OK too.
> >
> > Being able to get a better reporting for shared relations in back
> > branches would be nice, but that's going to require some invasive
> > chirurgy, isn't it?
>
> Yea, that's what I was worried about too. I think we basically would need a
> PageIsVerifiedExtended2() that backs the current PageIsVerifiedExtended(),
> with optional arguments that the "fixed" callers would use.

While it would be nice, I'm not sure that it would really be worth the trouble.

Maybe that's just me, but if I hit a corruption failure knowing whether it's a
global relation vs normal relation is definitely not something that will
radically change the following days / weeks of pain to fully resolve the
issue. Instead there would be other improvements that I would welcome on top
of fixing those counters, which would impact such new API.

For instance one of the thing you need to do in case of a corruption is to
understand the reason for the corruption, and for that knowing the underlying
tablespace rather than the database seems like a way more useful information to
track. For the rest, the relfilelocator, forknum and blocknum should already
be reported in the logs so you have the full details of what was intercepted
even if the pg_stat_database view is broken in the back branches.

But even if we had all that, there is still no guarantee (at least for now)
that we do see all the corruption as you might not read the "real" version of
the blockss if they are in shared buffers and/or in the OS cache, depending on
where the corruption actually happened.

And even if you could actually check what is physically stored on disk, that
would probably won't give you any strong guarantee that the rest data is
actually ok anyway. The biggest source of corruption I know is an old vmware
bug usually referred as the SEsparse bug, where in some occasion some blocks
would get written at the wrong location. In that case, the checksum can tell
me which are the blocks where the wrong write happened, but not what are the
blocks where the write should have happened, which are also entirely
inconsistent too. That's clearly out of postgres scope, but that's in my
opinion just one out of probably a lot more examples that makes the current bug
in back branches not worth spending too many efforts to fix.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2025-03-28 03:53:54 Re: Selectively invalidate caches in pgoutput module
Previous Message Noah Misch 2025-03-28 03:22:23 Re: AIO v2.5