Re: Statistics Import and Export

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Subject: Re: Statistics Import and Export
Date: 2024-03-30 00:54:20
Message-ID: CAOuzzgqwxpq=3aNF_j5rvYmb71BHY4niZfpeJZFa-QLyLA1GFg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

On Fri, Mar 29, 2024 at 19:35 Jeff Davis <pgsql(at)j-davis(dot)com> wrote:

> On Fri, 2024-03-29 at 18:02 -0400, Stephen Frost wrote:
> > I’d certainly think “with stats” would be the preferred default of
> > our users.
>
> I'm concerned there could still be paths that lead to an error. For
> pg_restore, or when loading a SQL file, a single error isn't fatal
> (unless -e is specified), but it still could be somewhat scary to see
> errors during a reload.

I understand that point.

Also, it's new behavior, so it may cause some minor surprises, or there
> might be minor interactions to work out. For instance, dumping stats
> doesn't make a lot of sense if pg_upgrade (or something else) is just
> going to run analyze anyway.

But we don’t expect anything to run analyze … do we? So I’m not sure why
it makes sense to raise this as a concern.

What do you think about starting off with it as non-default, and then
> switching it to default in 18?

What’s different, given the above arguments, in making the change with 18
instead of now? I also suspect that if we say “we will change the default
later” … that later won’t ever come and we will end up making our users
always have to remember to say “with-stats” instead.

The stats are important which is why the effort is being made in the first
place. If just doing an analyze after loading the data was good enough then
this wouldn’t be getting worked on.

Independently, I had a thought around doing an analyze as the data is being
loaded .. but we can’t do that for indexes (but we could perhaps analyze
the indexed values as we build the index..). This works when we do a
truncate or create the table in the same transaction, so we would tie into
some of the existing logic that we have around that. Would also adjust
COPY to accept an option that specifies the anticipated number of rows
being loaded (which we can figure out during the dump phase reasonably..).
Perhaps this would lead to a pg_dump option to do the data load as a
transaction with a truncate before the copy (point here being to be able to
still do parallel load while getting the benefits from knowing that we are
completely reloading the table). Just some other thoughts- which I don’t
intend to take away from the current effort at all, which I see as valuable
and should be enabled by default.

Thanks!

Stephen

>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2024-03-30 01:02:40 Re: Statistics Import and Export
Previous Message Corey Huinker 2024-03-30 00:28:12 Re: Statistics Import and Export