Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6
Date: 2019-04-30 18:27:41
Message-ID: 20190430182741.s6o7mo5ebmgxxi5x@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2019-04-30 14:05:50 -0400, Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > It's the lock-upgrade problem I theorized about
> > upthread. ReindexIndex(), via RangeVarCallbackForReindexIndex(), takes a
> > ShareLock on pg_class, and then goes on to upgrade to RowExclusiveLock
> > in RelationSetNewRelfilenode(). But at that time another session
> > obviously can already have the ShareLock and would also want to upgrade.
>
> Hmm. Note that this is totally independent of the deadlock mechanism
> I reported in my last message on this thread.

Yea :(

> > I'm not sure it's worth fixing this.
>
> I am not sure it's even *possible* to fix all these cases.

I think it's worth fixing the most common ones though. It sure sucks
that a plain REINDEX TABLE pg_class; isn't safe to run.

> Even if we could, it's out of scope for v12 let alone the back branches.

Unfortunately agreed. It's possible we could come up with a fix to
backpatch after maturing some, but certainly not before the release.

> I think the only practical solution is to remove those reindex tests.
> Even if we ran them in a script with no concurrent scripts, there'd
> be risk of failures against autovacuum, I'm afraid. Not often, but
> often enough to be annoying.

> Possibly we could run them in a TAP test that configures a cluster
> with autovac disabled?

Hm. Would it be sufficient to instead move them to a non-concurrent
test group, and stick a BEGIN; LOCK pg_class, ....; COMMIT; around it? I
think that ought to make it safe against autovacuum, and theoretically
there shouldn't be any overlapping pg_class/index updates that we'd need
to wait for?

This is a pretty finnicky area of the code, with obviously not enough
test coverage. I'm inclined to remove them from the back branches, and
try to get them working in master?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-04-30 18:35:51 Turning off enable_partition_pruning doesn't
Previous Message Tom Lane 2019-04-30 18:05:50 Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6