Re: ANALYZE ONLY

From: jian he <jian(dot)universality(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Michael Harris <harmic(at)gmail(dot)com>, torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>, Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, postgres(at)jeltef(dot)nl, ilya(dot)evdokimov(at)tantorlabs(dot)com
Subject: Re: ANALYZE ONLY
Date: 2024-09-24 01:00:15
Message-ID: CACJufxGPeivYnJjGpQf5YP-ANEBTsas5-jx4eknXkFbj7g4c_w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 23, 2024 at 7:53 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>
> On Mon, 23 Sept 2024 at 23:23, jian he <jian(dot)universality(at)gmail(dot)com> wrote:
> > looks fine. but maybe we can add the following information
> > "if The <literal>ONLY</literal> is specified, the second set of
> > statistics won't include each children individual statistics"
> > I think that's the main difference between specifying ONLY or not?
>
> Ok, I think you're not understanding this yet and I'm not sure what I
> can make more clear in the documents.
>
> Let me explain... For inheritance parent tables, ANALYZE ONLY will
> gather inheritance and non-inheritance statistics for ONLY the parent.
>
> Here's an example of that:
>
> drop table if exists parent,child;
> create table parent(a int);
> create table child () inherits (parent);
> insert into parent values(1);
> insert into child values(1);
>
> analyze ONLY parent;
> select starelid::regclass,stainherit,stadistinct from pg_statistic
> where starelid in ('parent'::regclass,'child'::regclass);
> starelid | stainherit | stadistinct
> ----------+------------+-------------
> parent | f | -1 <- this is the distinct estimate
> for SELECT * FROM ONLY parent;
> parent | t | -0.5 <- this is the distinct estimate
> for SELECT * FROM parent;
> (2 rows)
>
> For the stainherit==false stats, only 1 row is sampled here as that's
> the only row directly located in the "parent" table.
> For the stainherit==true stats, 2 rows are sampled, both of them have
> "a" == 1. The stadistinct reflects that fact.
>
> Note there have been no statistics recorded for "child". However,
> analyze did sample rows in that table as part of gathering sample rows
> for "parent" for the stainherit==true row.
>
> Now let's try again without ONLY.
>
> analyze parent;
>
> select starelid::regclass,stainherit,stadistinct from pg_statistic
> where starelid in ('parent'::regclass,'child'::regclass);
> starelid | stainherit | stadistinct
> ----------+------------+-------------
> parent | f | -1
> parent | t | -0.5
> child | f | -1
> (3 rows)
>
> All of the above rows were re-calculated with the "analyze parent"
> command, the first two rows have the same values as nothing changed in
> the table, however, there are now statistics stored for the "child"
> table.

thanks for your explanation!
now I don't have any questions about this patch.

> > catalog-pg-statistic.html second paragraph seems very clear to me.
> > Maybe we can link it somehow
>
> I don't know what "link it" means in this context.
>

i mean, change to:

By default,
<command>ANALYZE</command> will also recursively collect and update the
statistics for each inheritance child table. The <literal>ONLY</literal>
keyword may be used to disable this.
You may also refer to catalog <link
linkend="catalog-pg-statistic"><structname>pg_statistic</structname></link>
description about <literal>stainherit</literal>.

but <link linkend="catalog-pg-statistic"><structname>pg_statistic</structname></link>
already mentioned once.
maybe not a good idea.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2024-09-24 01:37:49 Re: Pgoutput not capturing the generated columns
Previous Message David Rowley 2024-09-24 00:30:59 Re: Proposal to Enable/Disable Index using ALTER INDEX