Re: pgsql: autovacuum: handle analyze for partitioned tables

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: pgsql: autovacuum: handle analyze for partitioned tables
Date: 2021-04-08 17:57:45
Message-ID: 1901125.1617904665@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> writes:
> On 2021-Apr-08, Tom Lane wrote:
>> Looks like this has issues under EXEC_BACKEND:
>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae&dt=2021-04-08%2005%3A50%3A08

> Hmm, I couldn't reproduce this under EXEC_BACKEND or otherwise, but I
> think this is unrelated to that, but rather a race condition.

Yeah. I hit this on another machine that isn't using EXEC_BACKEND,
and I concur it looks more like a race condition. I think the problem
is that autovacuum is calling find_all_inheritors() on a relation it
has no lock on, contrary to that function's API spec. find_all_inheritors
assumes the OID it's given is valid and locked, and adds it to the
result list automatically. Then it looks for children, and won't find
any in the race case where somebody else just dropped the table.
So we come back to relation_needs_vacanalyze with a list of just the
original rel OID, and since this loop believes that every syscache fetch
it does will succeed, kaboom.

I do not think it is sane to do find_all_inheritors() with no lock,
so I'd counsel doing something about that rather than band-aiding the
symptom. On the other hand, it's also not really okay not to have
an if-test-and-elog after the SearchSysCache call. "A cache lookup
cannot fail" is not an acceptable assumption in my book.

BTW, another thing that looks like a race condition is the
extract_autovac_opts() call that is done a little bit earlier,
also without lock. I think this is actually safe, but it's ONLY
safe because we resisted the calls by certain people to add a
toast table to pg_class. Otherwise, fetching reloptions could
have involved a toast pointer dereference, and it would then be
racy whether the toasted data was still there. As-is, even if
the pg_class row we're looking at has been deleted, we can safely
disassemble its reloptions. I think this matter is deserving
of a comment at least.

regards, tom lane

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Alvaro Herrera 2021-04-08 18:35:51 Re: pgsql: autovacuum: handle analyze for partitioned tables
Previous Message Julien Rouhaud 2021-04-08 16:39:49 Re: pgsql: Move pg_stat_statements query jumbling to core.

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2021-04-08 18:04:10 Re: test runner (was Re: SQL-standard function body)
Previous Message Andres Freund 2021-04-08 17:50:39 test runner (was Re: SQL-standard function body)