Re: Statistics Import and Export

From: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Statistics Import and Export
Date: 2024-03-27 06:27:17
Message-ID: CADkLM=faR8sSe4zoA7pTOoukUd-x6V7inf=cqMOAfzpCs+=oxg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
> 1) The docs say this:
>
> <para>
> The purpose of this function is to apply statistics values in an
> upgrade situation that are "good enough" for system operation until
> they are replaced by the next <command>ANALYZE</command>, usually via
> <command>autovacuum</command> This function is used by
> <command>pg_upgrade</command> and <command>pg_restore</command> to
> convey the statistics from the old system version into the new one.
> </para>
>
> I find this a bit confusing, considering the pg_dump/pg_restore changes
> are only in 0002, not in this patch.
>

True, I'll split the docs.

>
> 2) Also, I'm not sure about this:
>
> <parameter>relation</parameter>, the parameters in this are all
> derived from <structname>pg_stats</structname>, and the values
> given are most often extracted from there.
>
> How do we know where do the values come from "most often"? I mean, where
> else would it come from?
>

The next most likely sources would be 1. stats from another similar table
and 2. the imagination of a user testing hypothetical query plans.

>
> 3) The function pg_set_attribute_stats() is veeeeery long - 1000 lines
> or so, that's way too many for me to think about. I agree the flow is
> pretty simple, but I still wonder if there's a way to maybe split it
> into some smaller "meaningful" steps.
>

I wrestle with that myself. I think there's some pieces that can be
factored out.

> 4) It took me *ages* to realize the enums at the beginning of some of
> the functions are actually indexes of arguments in PG_FUNCTION_ARGS.
> That'd surely deserve a comment explaining this.
>

My apologies, it definitely deserves a comment.

>
> 5) The comment for param_names in pg_set_attribute_stats says this:
>
> /* names of columns that cannot be null */
> const char *param_names[] = { ... }
>
> but isn't that actually incorrect? I think that applies only to a couple
> initial arguments, but then other fields (MCV, mcelem stats, ...) can be
> NULL, right?
>

Yes, that is vestigial, I'll remove it.

>
> 6) There's a couple minor whitespace fixes or comments etc.
>
>
> 0002
> ----
>
> 1) I don't understand why we have exportExtStatsSupported(). Seems
> pointless - nothing calls it, even if it did we don't know how to export
> the stats.
>

It's not strictly necessary.

>
> 2) I think this condition in dumpTableSchema() is actually incorrect:
>
> IMO that's wrong, the statistics should be delayed to the post-data
> section. Which probably means there needs to be a separate dumpable
> object for statistics on table/index, with a dependency on the object.
>

Good points.

>
> 3) I don't like the "STATS IMPORT" description. For extended statistics
> we dump the definition as "STATISTICS" so why to shorten it to "STATS"
> here? And "IMPORT" seems more like the process of loading data, not the
> data itself. So I suggest "STATISTICS DATA".
>

+1

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2024-03-27 06:27:29 Re: pg_upgrade and logical replication
Previous Message Corey Huinker 2024-03-27 06:20:41 Re: Statistics Import and Export