Re: vacuumdb --analyze-in-stages

From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Euler Taveira <euler(at)eulerto(dot)com>
Cc: "nikolai(dot)berkoff" <nikolai(dot)berkoff(at)pm(dot)me>, "pgsql-docs(at)lists(dot)postgresql(dot)org" <pgsql-docs(at)lists(dot)postgresql(dot)org>
Subject: Re: vacuumdb --analyze-in-stages
Date: 2021-10-18 23:02:11
Message-ID: 202110182302.42p3ddrfr7ck@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs

On 2021-Oct-18, Euler Taveira wrote:

> On Fri, Oct 8, 2021, at 6:33 AM, nikolai.berkoff wrote:
> > Running --analyze-in-stages on a database with statistics causes the
> > statistics to become significantly worse before they are improved.
> > This could be made clearer in the documentation.
>
> I think the paragraph you modified *already* stated that usefulness of
> --analyze-in-stages (newly populated from a restored dump or by pg_upgrade). I
> don't think your change is an improvement for --analyze-in-stages description.
> If you read the previous paragraph, it explains _how_ (3 stages of ANALYZE with
> different settings) and _why_ (produce usable statistics faster).

Yeah, but it doesn't say that the initial stats target (1) is the worst
value you could possibly have.

I wonder if it isn't better to *remove* the second phrase from this
paragraph, and leave just this text:

This option is useful to analyze a database that was newly populated
from a restored dump or by <command>pg_upgrade</command>.

to avoid giving the impression that it'll improve things if you run it
in a database with existing statistics. Or we could proactively warn
more explicitly about the problem:

This option is useful to analyze a database that was newly populated
from a restored dump or by <command>pg_upgrade</command>.
Beware that running with this option in a database with existing
statistics may cause query optimizer choices to become
transiently worse, because of the very low statistics target
that is used in the early stages.

Or maybe just

This option is useful to analyze a database that was newly populated
from a restored dump or by <command>pg_upgrade</command>.
Running with this option in a database with existing statistics
is not recommended.

Given that the first stage uses statistic target=1, running this option
in a database with any stats at all is probably a bad idea.

--
Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/

In response to

Responses

Browse pgsql-docs by date

  From Date Subject
Next Message David G. Johnston 2021-10-18 23:34:25 Re: vacuumdb --analyze-in-stages
Previous Message Euler Taveira 2021-10-18 20:15:59 Re: ALTER TABLE ... SET DATA TYPE removes statistics