Re: Statistics Import and Export

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Cc: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, alvherre(at)alvh(dot)no-ip(dot)org
Subject: Re: Statistics Import and Export
Date: 2024-08-23 20:49:52
Message-ID: 11bb2c7403d4a7421fe8b09df6737aa83976266e.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 2024-08-15 at 20:53 -0400, Corey Huinker wrote:
>
> >   * Remind me why the new stats completely replace the new row,
> > rather
> > than updating only the statistic kinds that are specified?
>
> because:
> - complexity

I don't think it significantly impacts the overall complexity. We have
a ShareUpdateExclusiveLock on the relation, so there's no concurrency
to deal with, and an upsert operation is not many more lines of code.

> - we would then need a mechanism to then tell it to *delete* a
> stakind

That sounds useful regardless. I have introduced pg_clear_*_stats()
functions.

> - we'd have to figure out how to reorder the remaining stakinds, or
> spend effort finding a matching stakind in the existing row to know
> to replace it

Right. I initialized the values/nulls arrays based on the existing
tuple, if any, and created a set_stats_slot() function that searches
for either a matching stakind or the first empty slot.

> - "do what analyze does" was an initial goal and as a result many
> test cases directly compared pg_statistic rows from an original table
> to an empty clone table to see if the "copy" had fidelity.

Can't we just clear the stats first to achieve the same effect?

I have attached version 28j as one giant patch covering what was
previously 0001-0003. It's a bit rough (tests in particular need some
work), but it implelements the logic to replace only those values
specified rather than the whole tuple.

At least for the interactive "set" variants of the functions, I think
it's an improvement. It feels more natural to just change one stat
without wiping out all the others. I realize a lot of the statistics
depend on each other, but the point is not to replace ANALYZE, the
point is to experiment with planner scenarios. What do others think?

For the "restore" variants, I'm not sure it matters a lot because the
stats will already be empty. If it does matter, we could pretty easily
define the "restore" variants to wipe out existing stats when loading
the table, though I'm not sure if that's a good thing or not.

I also made more use of FunctionCallInfo structures to communicate
between functions rather than huge parameter lists. I believe that
reduced the line count substantially, and made it easier to transform
the argument pairs in the "restore" variants into the positional
arguments for the "set" variants.

Regards,
Jeff Davis

Attachment Content-Type Size
v28j-0001-Create-functions-pg_set_relation_stats-pg_clear.patch text/x-patch 113.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joel Jacobson 2024-08-23 21:03:56 Re: Optimising numeric division
Previous Message Robert Haas 2024-08-23 20:31:54 Re: pg_verifybackup: TAR format backup verification