Re: Corrupt index stopping autovacuum system wide

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Aaron Pelz <aaronepelz(at)gmail(dot)com>, pgsql-general General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Corrupt index stopping autovacuum system wide
Date: 2019-07-17 18:43:05
Message-ID: 20190717184305.GA25848@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 2019-Jul-17, Tom Lane wrote:

> Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:
> > On 2019-Jul-17, Peter Geoghegan wrote:
> >> Maybe nbtree VACUUM should do something more aggressive than give up
> >> when there is a "failed to re-find parent key" or similar condition.
> >> Perhaps it would make more sense to make the index inactive (for some
> >> value of "inactive") instead of just complaining. That might be the
> >> least worst option, all things considered.
>
> > Maybe we can mark an index as unvacuumable in some way? As far as I
> > understand, all queries using that index work, as do index updates; it's
> > just vacuuming that fails. If we mark the index as unvacuumable, then
> > vacuum just skips it (and does not run phase 3 for that table), and
> > things can proceed; the table's age can still be advanced. Obviously
> > it'll result in more bloat than in normal condition, but it shouldn't
> > cause the whole cluster to go down.
>
> If an index is corrupt enough to break vacuum, I think it takes a rather
> large leap of faith to believe that it's not going to cause problems for
> inserts or searches.

Maybe, but it's what happened in the reported case. (Note Aaron was
careful to do the index replacement concurrently -- he wouldn't have
done that if the table wasn't in active use.)

> I'd go with just marking the index broken and
> insisting that it be REINDEX'd before we touch it again.

This might make things worse operationally, though. If searches aren't
failing but vacuum is, we'd break a production system that currently
works.

> (a) once the transaction's failed, you can't go making catalog updates;

Maybe we can defer the actual update to some other transaction -- say
register an autovacuum work-item, which can be executed separately.

> (b) even when you know the transaction's failed, blaming it on a
> particular index seems a bit chancy;

Well, vacuum knows what index is being processed. Maybe you're thinking
that autovac can get an out-of-memory condition or something like that;
perhaps we can limit the above only when an ERRCODE_DATA_CORRUPTED
condition is reported (and make sure all such conditions do that. As
far as I remember we have a patch for this particular error to be
reported as such.)

> (c) automatically disabling constraint indexes seems less than desirable.

Disabling them for writes, yeah.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message David G. Johnston 2019-07-17 18:45:58 Re: Looking for Postgres upgrade Metrix
Previous Message Perumal Raj 2019-07-17 18:38:52 Re: Looking for Postgres upgrade Metrix