Re: Autovacuum on partitioned table (autoanalyze)

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: yuzuko <yuzukohosoya(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Amit Langote <amitlangote09(at)gmail(dot)com>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Greg Stark <stark(at)mit(dot)edu>
Subject: Re: Autovacuum on partitioned table (autoanalyze)
Date: 2021-04-08 15:55:00
Message-ID: 5989f7c6-ee0d-cc52-7f3c-15e9186557b6@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4/8/21 5:27 PM, Alvaro Herrera wrote:
> On 2021-Apr-08, Tomas Vondra wrote:
>
>> On 4/8/21 5:22 AM, Alvaro Herrera wrote:
>
>>> However, I just noticed there is a huge problem, which is that the new
>>> code in relation_needs_vacanalyze() is doing find_all_inheritors(), and
>>> we don't necessarily have a snapshot that lets us do that. While adding
>>> a snapshot acquisition at that spot is a very easy fix, I hesitate to
>>> fix it that way, because the whole idea there seems quite wasteful: we
>>> have to look up, open and lock every single partition, on every single
>>> autovacuum iteration through the database. That seems bad. I'm
>>> inclined to think that a better idea may be to store reltuples for the
>>> partitioned table in pg_class.reltuples, instead of having to add up the
>>> reltuples of each partition. I haven't checked if this is likely to
>>> break anything.
>>
>> How would that value get updated, for the parent?
>
> Same as for any other relation: ANALYZE would set it, after it's done
> scanning the table. We would to make sure that nothing resets it to
> empty, though, and that it doesn't cause issues elsewhere. (The patch I
> sent contains the minimal change to make it work, but of course that's
> missing having other pieces of code maintain it.)
>

So ANALYZE would inspect the child relations, sum the reltuples and set
it for the parent? IMO that'd be problematic because it'd mean we're
comparing the current number of changes with reltuples value which may
be arbitrarily stale (if we haven't analyzed the parent for a while).

That's essentially the issue I described when explaining why I think the
code needs to propagate the changes, reread the stats and then evaluate
which relations need vacuuming. It's similar to the issue of comparing
old changes_since_analyze vs. current reltuples, which is why the code
is rereading the stats before checking the thresholds. This time it's
the opposite direction - the reltuples might be stale.

FWIW I think the current refresh logic is not quite correct, because
autovac_refresh_stats does some throttling (STATS_READ_DELAY). It
probably needs a "force" parameter to ensure it actually reads the
current stats in this one case.

>>> (Also, a minor buglet: if we do ANALYZE (col1), then ANALYZE (col2) a
>>> partition, then we repeatedly propagate the counts to the parent table,
>>> so we would cause the parent to be analyzed more times than it should.
>>> Sounds like we should not send the ancestor list when a column list is
>>> given to manual analyze. I haven't verified this, however.)
>>
>> Are you sure? I haven't tried, but shouldn't this be prevented by only
>> sending the delta between the current and last reported value?
>
> I did try, and yes it behaves as you say.
>

OK, good.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Steele 2021-04-08 15:58:19 Re: VACUUM (DISABLE_PAGE_SKIPPING on)
Previous Message Simon Riggs 2021-04-08 15:41:41 Re: VACUUM (DISABLE_PAGE_SKIPPING on)