Re: Checksum errors in pg_stat_database

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Checksum errors in pg_stat_database
Date: 2022-12-11 20:18:42
Message-ID: CABUevExGXxStJaM0hLQY_kht_S3HnszgVH1=zk0xcx5ccz7tBQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 8, 2022 at 2:35 PM Drouvot, Bertrand <
bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:

>
>
> On 4/2/19 7:06 PM, Magnus Hagander wrote:
> > On Tue, Apr 2, 2019 at 8:47 AM Michael Paquier <michael(at)paquier(dot)xyz
> <mailto:michael(at)paquier(dot)xyz>> wrote:
> >
> > On Tue, Apr 02, 2019 at 07:43:12AM +0200, Julien Rouhaud wrote:
> > > On Tue, Apr 2, 2019 at 6:56 AM Michael Paquier <
> michael(at)paquier(dot)xyz <mailto:michael(at)paquier(dot)xyz>> wrote:
> > >> One thing which is not
> > >> proposed on this patch, and I am fine with it as a first draft,
> is
> > >> that we don't have any information about the broken block number
> and
> > >> the file involved. My gut tells me that we'd want a separate
> view,
> > >> like pg_stat_checksums_details with one tuple per (dboid, rel,
> fork,
> > >> blck) to be complete. But that's just for future work.
> > >
> > > That could indeed be nice.
> >
> > Actually, backpedaling on this one... pg_stat_checksums_details may
> > be a bad idea as we could finish with one row per broken block. If
> > a corruption is spreading quickly, pgstat would not be able to
> sustain
> > that amount of objects. Having pg_stat_checksums would allow us to
> > plugin more data easily based on the last failure state:
> > - last relid of failure
> > - last fork type of failure
> > - last block number of failure.
> > Not saying to do that now, but having that in pg_stat_database does
> > not seem very natural to me. And on top of that we would have an
> > extra row full of NULLs for shared objects in pg_stat_database if we
> > adopt the unique view approach... I find that rather ugly.
> >
> >
> > I think that tracking each and every block is of course a non-starter,
> as you've noticed.
>
> I think that's less of a concern now that the stats collector process has
> gone and that the stats are now collected in shared memory, what do you
> think?
>

It would be less of a concern yes, but I think it still would be a concern.
If you have a large amount of corruption you could quickly get to millions
of rows to keep track of which would definitely be a problem in shared
memory as well, wouldn't it?

But perhaps we could keep a list of "the last 100 checksum failures" or
something like that?

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-12-11 20:41:21 Re: Error-safe user functions
Previous Message Tom Lane 2022-12-11 18:29:57 Re: Error-safe user functions