Re: ANALYZE ONLY

From: Michael Harris <harmic(at)gmail(dot)com>
To: jian he <jian(dot)universality(at)gmail(dot)com>
Cc: torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, postgres(at)jeltef(dot)nl, ilya(dot)evdokimov(at)tantorlabs(dot)com
Subject: Re: ANALYZE ONLY
Date: 2024-09-20 01:20:24
Message-ID: CADofcAW43AD=qqtj1cLkTyRpPM6JB5ZALUK7CA1KZZqpcouoYw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for the feedback, and sorry it has taken a few days to respond.

On Mon, 16 Sept 2024 at 12:29, jian he <jian(dot)universality(at)gmail(dot)com> wrote:
> in https://www.postgresql.org/docs/current/ddl-inherit.html
> <<<
> Commands that do database maintenance and tuning (e.g., REINDEX,
> VACUUM) typically only work on individual, physical tables and do not
> support recursing over inheritance hierarchies. The respective
> behavior of each individual command is documented in its reference
> page (SQL Commands).
> <<<
> does the above para need some tweaks?

Yes, good catch.

> ANALYZE ONLY only_parted*;
> will fail.
> Maybe we need a sentence saying ONLY and * are mutually exclusive?

I used the same text that appears on other pages that are describing the
operation of ONLY / *, eg. the page for TRUNCATE
(https://www.postgresql.org/docs/current/sql-truncate.html)

I think it would be good to keep them all consistent if possible.

> >>>
> If the table being analyzed has inheritance children, ANALYZE gathers
> two sets of statistics: one on the rows of the parent table only, and
> a second including rows of both the parent table and all of its
> children. This second set of statistics is needed when planning
> queries that process the inheritance tree as a whole. The child tables
> themselves are not individually analyzed in this case.
> >>>
> is this still true for table inheritance.

The way I interpret this is that when you analyze an inheritance parent,
it will sample the rows of the parent and all its children. It will
prepare two sets
of stats, one based on the samples of any rows in the parent itself,
the other based on samples of rows in both parent and child tables.
This is distinct from the activity of updating the stats on the child
tables themselves.

> I am not sure if this output and the manual description about table
> inheritance is consistent.

With the test you performed, the result was a set of statistics for the
parent table which included samples from the child tables
(stainherit=t) but no entries for the parent table only
(stainherit=f) as there is no data in the parent table itself.
There are no statistics for only_inh_child.

If you add some records to the parent table and re-analyze, you
do get statistics with stainherit=f:

postgres=# insert into only_inh_parent(a,b) select g % 10, 100 from
generate_series(1,1000) g;
INSERT 0 1000
postgres=# select pg_table_size('only_inh_parent');
pg_table_size
---------------
65536
(1 row)

postgres=# ANALYZE ONLY only_inh_parent;
ANALYZE
postgres=# select stadistinct,starelid::regclass,staattnum, stainherit
from pg_statistic
where starelid = ANY ('{only_inh_parent, only_inh_child}'::regclass[]);
stadistinct | starelid | staattnum | stainherit
-------------+-----------------+-----------+------------
80 | only_inh_parent | 1 | t
200 | only_inh_parent | 2 | t
10 | only_inh_parent | 1 | f
1 | only_inh_parent | 2 | f
(4 rows)

and if you perform ANALYZE without ONLY:

postgres=# ANALYZE only_inh_parent;
ANALYZE
postgres=# select stadistinct,starelid::regclass,staattnum, stainherit
from pg_statistic
where starelid = ANY ('{only_inh_parent, only_inh_child}'::regclass[]);
stadistinct | starelid | staattnum | stainherit
-------------+-----------------+-----------+------------
80 | only_inh_parent | 1 | t
10 | only_inh_parent | 1 | f
1 | only_inh_parent | 2 | f
200 | only_inh_parent | 2 | t
80 | only_inh_child | 1 | f
-0.2 | only_inh_child | 2 | f
(6 rows)

Now we also have statistics for only_inh_child.

At least for me, that is consistent with the para you
quoted, except perhaps for this sentence:

>> The child tables themselves are not individually analyzed in this case.

This should probably read:

>> If the ONLY keyword is used, the child tables themselves are not individually analyzed.

I have attached a new version of the patch with this feedback incorporated.

Thanks again!

Cheers
Mike.

Attachment Content-Type Size
v6-0001-Implementation-of-the-ONLY-feature.patch application/octet-stream 18.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2024-09-20 01:57:57 Re: Robocopy might be not robust enough for never-ending testing on Windows
Previous Message Michael Paquier 2024-09-20 00:37:54 Re: Partitioned tables and [un]loggedness