From: | Chris Travers <chris(dot)travers(at)adjust(dot)com> |
---|---|
To: | Peter Geoghegan <pg(at)bowt(dot)ie> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: recovering from "found xmin ... from before relfrozenxid ..." |
Date: | 2020-08-18 11:29:05 |
Message-ID: | CAN-RpxD8k_T4B8tgCWuu=0GnUEnN6NaepiUx8WMUPwUgHEgeuQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Jul 14, 2020 at 12:28 AM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> On Mon, Jul 13, 2020 at 2:12 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > 1. There's nothing to identify the tuple that has the problem, and no
> > way to know how many more of them there might be. Back-patching
> > b61d161c146328ae6ba9ed937862d66e5c8b035a would help with the first
> > part of this.
>
> I am in favor of backpatching such changes in cases where senior
> community members feel that it could help with hypothetical
> undiscovered data corruption issues -- if they're willing to take
> responsibility for the change. It certainly wouldn't be the first
> time. A "defense in depth" mindset seems like the right one when it
> comes to data corruption bugs. Early detection is really important.
>
> > Moreover, not everyone is as
> > interested in an extended debugging exercise as they are in getting
> > the system working again, and VACUUM failing repeatedly is a pretty
> > serious problem.
>
> That's absolutely consistent with my experience. Most users want to
> get back to business as usual now, while letting somebody else do the
> hard work of debugging.
>
Also even if you do trace the problem you still have to recover.
And sometimes I have found latent corruption from times when dbs were
running on older versions and older servers, making debugging largely a
futile exercise.
>
> > Therefore, one of my colleagues has - at my request - created a couple
> > of functions called heap_force_kill() and heap_force_freeze() which
> > take an array of TIDs.
>
> > So I have these questions:
> >
> > - Do people think it would me smart/good/useful to include something
> > like this in PostgreSQL?
>
> I'm in favor of it.
>
+1
Would be worth extending it with some functions to grab rows that have
various TOAST oids too.
>
> > - If so, how? I would propose a new contrib module that we back-patch
> > all the way, because the VACUUM errors were back-patched all the way,
> > and there seems to be no advantage in making people wait 5 years for a
> > new version that has some kind of tooling in this area.
>
> I'm in favor of it being *possible* to backpatch tooling that is
> clearly related to correctness in a fundamental way. Obviously this
> would mean that we'd be revising our general position on backpatching
> to allow some limited exceptions around corruption. I'm not sure that
> this meets that standard, though. It's hardly something that we can
> expect all that many users to be able to use effectively.
>
> I may be biased, but I'd be inclined to permit it in the case of
> something like amcheck, or pg_visibility, on the grounds that they're
> more or less the same as the new VACUUM errcontext instrumentation you
> mentioned. The same cannot be said of something like this new
> heap_force_kill() stuff.
>
> > - Any ideas for additional things we should include, or improvements
> > on the sketch above?
>
> Clearly you should work out a way of making it very hard to
> accidentally (mis)use. For example, maybe you make the functions check
> for the presence of a sentinel file in the data directory.
>
Agreed.
>
>
> --
> Peter Geoghegan
>
>
>
--
Best Regards,
Chris Travers
Head of Database
Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin
From | Date | Subject | |
---|---|---|---|
Next Message | Greg Nancarrow | 2020-08-18 11:32:00 | Re: Libpq support to connect to standby server as priority |
Previous Message | Ashutosh Sharma | 2020-08-18 11:21:38 | Re: recovering from "found xmin ... from before relfrozenxid ..." |