Re: Statistics Import and Export

From: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, alvherre(at)alvh(dot)no-ip(dot)org
Subject: Re: Statistics Import and Export
Date: 2024-07-28 01:08:44
Message-ID: CADkLM=chb8YT+OD40aDDj1MGz52FMw6-SW3atgEpsw6Rp-3mhg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
> The "restore" use case is the primary point of your patch, and that
> should be as simple and future-proof as possible. The parameters should
> be name/value pairs and there shouldn't be any "control" parameters --
> it's not the job of pg_dump to specify whether the restore should be
> transactional or in-place, it should just output the necessary stats.
>
> That restore function might be good enough to satisfy the "ad-hoc stats
> hacking" use case as well, but I suspect we want slightly different
> behavior. Specifically, I think we'd want the updates to be
> transactional rather than in-place, or at least optional.
>

Point well taken.

Both function pairs now call a generic internal function.

Which is to say that pg_set_relation_stats and pg_restore_relation_stats
both accept parameters in their own way, and both call
an internal function relation_statistics_update(), each with their own
defaults.

pg_set_relation_stats always leaves "version" NULL, does transactional
updates, and treats any data quality issue as an ERROR. This is is in line
with a person manually tweaking stats to check against a query to see if
the plan changes.

pg_restore_relation_stats does in-place updates, and steps down all errors
to warnings. The stats may not write, but at least it won't fail the
pg_upgrade for you.

pg_set_attribute_stats is error-maximalist like pg_set_relation_stats.
pg_restore_attribute_stats never had an in-place option to begin with.

>
> > The leading OUT parameters tell us the rel/attribute/inh affected (if
> > any), and which params had to be rejected for whatever reason. The
> > kwargs is the variadic key-value pairs that we were using for all
> > stat functions, but now we will be using it for all parameters, both
> > statistics and control, the control parameters will be:
>
> I don't like the idea of mixing statistics and control parameters in
> the same list.
>

There's no way around it, at least now we need never worry about a
confusing order for the parameters in the _restore_ functions because they
can now be in any order you like. But that speaks to another point: there
is no "you" in using the restore functions, those function calls will
almost exclusively be generated by pg_dump and we can all live rich and
productive lives never having seen one written down. I kid, but they're
actually not that gross.

Here is a -set function taken from the regression tests:

SELECT pg_catalog.pg_set_attribute_stats(
relation => 'stats_import.test'::regclass::oid,
attname => 'arange'::name,
inherited => false::boolean,
null_frac => 0.5::real,
avg_width => 2::integer,
n_distinct => -0.1::real,
range_empty_frac => 0.5::real,
range_length_histogram => '{399,499,Infinity}'::text
);
pg_set_attribute_stats
------------------------

(1 row)

and here is a restore function

-- warning: mcv cast failure
SELECT *
FROM pg_catalog.pg_restore_attribute_stats(
'relation', 'stats_import.test'::regclass::oid,
'attname', 'id'::name,
'inherited', false::boolean,
'version', 150000::integer,
'null_frac', 0.5::real,
'avg_width', 2::integer,
'n_distinct', -0.4::real,
'most_common_vals', '{2,four,3}'::text,
'most_common_freqs', '{0.3,0.25,0.05}'::real[]
);
WARNING: invalid input syntax for type integer: "four"
row_written | stats_applied | stats_rejected
| params_rejected
-------------+----------------------------------+--------------------------------------+-----------------
t | {null_frac,avg_width,n_distinct} |
{most_common_vals,most_common_freqs} |
(1 row)

There's a few things going on here:

1. An intentionally bad, impossible to write, value was put in
'most_common_vals'. 'four' cannot cast to integer, so the value fails, and
we get a warning
2. Because most_common_values failed, we can no longer construct a legit
STAKIND_MCV, so we have to throw out most_common_freqs with it.
3. Those failures aren't enough to prevent us from writing the other stats,
so we write the record, and report the row written, the stats we could
write, the stats we couldn't, and a list of other parameters we entered
that didn't make sense and had to be rejected (empty).

Overall, I'd say the format is on the pedantic side, but it's far from
unreadable, and mixing control parameters (version) with stats parameters
isn't that big a deal.

I do like the idea of returning a set, but I think it should be the
> positive set (effectively a representation of what is now in the
> pg_stats view) and any ignored settings would be output as WARNINGs.
>

Displaying the actual stats in pg_stats could get very, very big. So I
wouldn't recommend that.

What do you think of the example presented earlier?

Attached is v25.

Key changes:
- Each set/restore function pair now each call a common function that does
the heavy lifting, and the callers mostly marshall parameters into the
right spot and form the result set (really just one row).
- The restore functions now have all parameters passed in via a variadic
any[].
- the set functions now error out on just about any discrepancy, and do not
have a result tuple.
- test cases simplified a bit. There's still a lot of them, and I think
that's a good thing.
- Documentation to reflect significant reorganization.
- pg_dump modified to generate new function signatures.

Attachment Content-Type Size
v25-0001-Create-function-pg_set_relation_stats.patch text/x-patch 19.2 KB
v25-0004-Add-derivative-flags-dumpSchema-dumpData.patch text/x-patch 17.7 KB
v25-0002-Create-function-pg_set_attribute_stats.patch text/x-patch 90.3 KB
v25-0003-Create-functions-pg_restore_-_stats.patch text/x-patch 146.2 KB
v25-0005-Enable-dumping-of-table-index-stats-in-pg_dump.patch text/x-patch 33.9 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Joseph Koshakow 2024-07-28 01:10:23 Re: Fix overflow in pg_size_pretty
Previous Message David Rowley 2024-07-28 00:00:21 Re: Fix overflow in pg_size_pretty