From: | Magnus Hagander <magnus(at)hagander(dot)net> |
---|---|
To: | Julien Rouhaud <rjuju123(at)gmail(dot)com> |
Cc: | David Steele <david(at)pgmasters(dot)net>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Checksum errors in pg_stat_database |
Date: | 2019-03-30 15:01:50 |
Message-ID: | CABUevEyE8kxFB3VAoUKD6iJ9N10ur5oQykMRRttHYbw-X-Cvkg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, Mar 30, 2019 at 3:55 PM Julien Rouhaud <rjuju123(at)gmail(dot)com> wrote:
> On Sat, Mar 30, 2019 at 2:33 PM Magnus Hagander <magnus(at)hagander(dot)net>
> wrote:
> >
> > On Wed, Mar 13, 2019 at 4:54 PM Julien Rouhaud <rjuju123(at)gmail(dot)com>
> wrote:
> >>
> >> On Wed, Mar 13, 2019 at 4:53 PM Julien Rouhaud <rjuju123(at)gmail(dot)com>
> wrote:
> >> >
> >> > As a result I ended up simply adding counters for the number of total
> >> > checks and the timestamp of the last failure in PgStat_StatDBEntry,
> >> > making attached patch very lightweight. I moved all the checksum
> >> > related counters out of pg_stat_database in a new pg_stat_checksum
> >> > view. It avoids to make pg_stat_database too wide, and also allows to
> >> > display information about shared object in this new view (some of the
> >> > other counters don't really make sense for shared objects or could
> >> > break existing monitoring query). While at it, I tried to add a
> >> > little bit of documentation wrt. checksum monitoring.
> >>
> >> and of course I forgot to attach the patch.
> >
> >
> > Does it really make any sense to track "number of checksum checks"? In
> any sort of interesting database that's just going to be an insanely high
> number, isn't it? (And also, to stay consistent with checksum failures, we
> should of course also count the checks done in base backups, which is not
> in the patch. But I'm more thinking we should drop it)
>
> Thanks for looking at it!
>
> It's surely going to be a huge number on databases with a large number
> of buffer eviction and/or frequent pg_basebackup. The idea was to be
> able to know if the possible lack of failure was due to lack of check
> at all or because the server appears to be healthy, without spamming
> gettimeofday calls. If having a last_check() is better, I'm fine with
> it. If it's useless, let's drop it.
>
I'm not sure either of them are really useful, but would be happy to take
input from others :)
The number of checks was supposed to also be tracked in base_backups, with
>
Oh, that's a sloppy review. I see it's there. However, it doesn't appear to
count up in the *normal* backend path...
My vote is still to drop it completely, but if we're keeping it, it has to
go in both paths.
> Having thought some more about this, I wonder if the right thing to do is
> to actually add a row to pg_stat_database for the global stats, rather than
> invent a separate view. I can see the argument going both ways, but
> particularly with the name pg_stat_checksums we are setting a pattern that
> will create one view for each counter. That's not very good, I think.
> >
> > In the end I'm somewhat split on the idea of pg_stat_database with a
> NULL row or pg_stat_checkpoints. What do others think?
>
> I agree that having a separate view for each counter if a bad idea.
> But what I was thinking is that we'll probably end up with a view to
> track per-db online checksum activation progress/activity/status at
> some point (similar to pg_stat_progress_vacuum), so why not starting
> with this dedicated view right now and add new counters later, either
> in pgstat and/or some shmem, as long as we keep the view name as SQL
> interface.
>
Technically, that should be in pg_stat_progress_checksums to be consistent
:) So whichever way we turn, it's going to be inconsistent with something.
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2019-03-30 15:11:27 | Re: speeding up planning with partitions |
Previous Message | Julien Rouhaud | 2019-03-30 14:57:31 | Re: Checksum errors in pg_stat_database |