Re: Autovacuum on partitioned table (autoanalyze)

From: Andres Freund <andres(at)anarazel(dot)de>
To: Álvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, yuzuko <yuzukohosoya(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, David Steele <david(at)pgmasters(dot)net>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Amit Langote <amitlangote09(at)gmail(dot)com>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Greg Stark <stark(at)mit(dot)edu>
Subject: Re: Autovacuum on partitioned table (autoanalyze)
Date: 2021-08-17 10:49:50
Message-ID: 20210817104950.bkvv2cvttylpn4j3@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2021-08-16 17:42:48 -0400, Álvaro Herrera wrote:
> On 2021-Aug-16, Álvaro Herrera wrote:
>
> > Here's the reversal patch for the 14 branch. (It applies cleanly to
> > master, but the unused member of PgStat_StatTabEntry needs to be
> > removed and catversion bumped).
>
> I have pushed this to both branches. (I did not remove the item from
> the release notes in the 14 branch.)
>
> It upsets me to have reverted it, but after spending so much time trying
> to correct the problems, I believe it just wasn't salvageable within the
> beta-period code freeze constraints.

:(

> I described the issues I ran into
> in earlier messages; I think a good starting point to re-develop this is
> to revert the reversal commit, then apply my patch at
> https://postgr.es/m/0794d7ca-5183-486b-9c5e-6d434867cecd@www.fastmail.com
> then do something about the remaining problems that were complained
> about. (Maybe: add an "ancestor OID" member to PgStat_StatTabEntry so
> that the collector knows to propagate counts from children to ancestors
> when the upd/ins/del counts are received.

My suspicion is that it'd be a lot easier to implement this efficiently if
there were no propagation done outside of actually analyzing tables. I.e. have
do_autovacuum() build a hashtable of (parent_table_id, count) and use that to
make the analyze decisions. And then only propagate up the costs to parents of
tables when a child is analyzed (and thus looses its changes_since_analyze)
value. Then we can use hashtable_value + changes_since_analyze for
partitioning decisions of partitioned tables.

I've prototyped this, and it does seem to make do_autovacuum() cheaper. I've
attached that prototype, but note it's in a rough state.

However, unless we change the way inheritance parents are stored, it still
requires repetitive get_partition_ancestors() (or get_partition_parent())
calls in do_autovacuum(), which I think is problematic due to the index scans
you pointed out as well. The obvious way to address that would be to store
parent oids in pg_class - I suspect duplicating parents in pg_class is the
best way out, but pretty it is not.

> However, consider developing it as follow-up to Horiguchi-san's shmem
> pgstat rather than current pgstat implementation.)

+1

It might be worth to first tackle reusing samples from a relation's children
when building inheritance stats. Either by storing the samples somewhere (not
cheap) and reusing them, or by at least updating a partition's stats when
analyzing the parent.

Greetings,

Andres Freund

Attachment Content-Type Size
autovac-partitioned-via-hash.diff text/x-diff 19.0 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2021-08-17 10:50:22 Re: Autovacuum on partitioned table (autoanalyze)
Previous Message Amit Kapila 2021-08-17 10:08:44 Re: Added schema level support for publication.