On Tue, Dec 10, 2019 at 11:12:34AM +0100, Julien Rouhaud wrote:
> On Tue, Dec 10, 2019 at 3:26 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> >
> > On Mon, Dec 09, 2019 at 07:02:43PM +0100, Julien Rouhaud wrote:
> > > On Mon, Dec 9, 2019 at 5:21 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > >> Some people might prefer notices, because you can get those while the
> > >> thing is still running, rather than a result set, which you will only
> > >> see when the query finishes. Other people might prefer an SRF, because
> > >> they want to have the data in structured form so that they can
> > >> postprocess it. Not sure what you mean by "more globally."
> > >
> > > I meant having the results available system-wide, not only to the
> > > caller. I think that emitting a log/notice level should always be
> > > done on top on whatever other communication facility we're using.
> >
> > The problem of notice and logs is that they tend to be ignored. Now I
> > don't see no problems either in adding something into the logs which
> > can be found later on for parsing on top of a SRF returned by the
> > caller which includes all the corruption details, say with pgbadger
> > or your friendly neighborhood grep. I think that any backend function
> > should also make sure to call pgstat_report_checksum_failure() to
> > report a report visible at database-level in the catalogs, so as it is
> > possible to use that as a cheap high-level warning. The details of
> > the failures could always be dug from the logs or the result of the
> > function itself after finding out that something is wrong in
> > pg_stat_database.
>
> I agree that adding extra information in the logs and calling
> pgstat_report_checksum_failure is a must do, and I changed that
> locally. However, I doubt that the logs is the right place to find
> the details of corrupted blocks. There's no guarantee that the file
> will be accessible to the DBA, nor that the content won't get
> truncated by the time it's needed. I really think that corruption is
> important enough to justify more specific location.
The cfbot reported a build failure, so here's a rebased v2 which also contains
the pg_stat_report_failure() call and extra log info.