Re: Statistics Import and Export

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Statistics Import and Export
Date: 2023-12-27 15:29:01
Message-ID: ZYxCvVTexwVww8Gs@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 27, 2023 at 01:08:47PM +0100, Tomas Vondra wrote:
> On 12/26/23 20:19, Tom Lane wrote:
> > Bruce Momjian <bruce(at)momjian(dot)us> writes:
> >> I think we need a robust API to handle two cases:
> >
> >> * changes in how we store statistics
> >> * changes in how how data type values are represented in the statistics
> >
> >> We have had such changes in the past, and I think these two issues are
> >> what have prevented import/export of statistics up to this point.
> >> Developing an API that doesn't cleanly handle these will cause long-term
> >> pain.
> >
> > Agreed.
> >
>
> I agree the format is important - we don't want to end up with a format
> that's cumbersome or inconvenient to use. But I don't think the proposed
> format is somewhat bad in those respects - it mostly reflects how we
> store statistics and if I was designing a format for humans, it might
> look a bit differently. But that's not the goal here, IMHO.
>
> I don't quite understand the two cases above. Why should this affect how
> we store statistics? Surely, making the statistics easy to use for the
> optimizer is much more important than occasional export/import.

The two items above were to focus on getting a solution that can easily
handle future statistics storage changes. I figured we would want to
manipulate the data as a table internally so I am confused why we would
export JSON instead of a COPY format. I didn't think we were changing
how we internall store or use the statistics.

> >> In summary, I think we need an SQL-level command for this.
> >
> > I think a SQL command is an actively bad idea. It'll just add development
> > and maintenance overhead that we don't need. When I worked on this topic
> > years ago at Salesforce, I had things set up with simple functions, which
> > pg_dump would invoke by writing more or less
> >
> > SELECT pg_catalog.load_statistics(....);
> >
> > This has a number of advantages, not least of which is that an extension
> > could plausibly add compatible functions to older versions. The trick,
> > as you say, is to figure out what the argument lists ought to be.
> > Unfortunately I recall few details of what I wrote for Salesforce,
> > but I think I had it broken down in a way where there was a separate
> > function call occurring for each pg_statistic "slot", thus roughly
> >
> > load_statistics(table regclass, attname text, stakind int, stavalue ...);
> >
> > I might have had a separate load_statistics_xxx function for each
> > stakind, which would ease the issue of deciding what the datatype
> > of "stavalue" is. As mentioned already, we'd also need some sort of
> > version identifier, and we'd expect the load_statistics() functions
> > to be able to transform the data if the old version used a different
> > representation. I agree with the idea that an explicit representation
> > of the source table attribute's type would be wise, too.
> >
>
> Yeah, this is pretty much what I meant by "functional" interface. But if
> I said maybe the format implemented by the patch is maybe too close to
> how we store the statistics, then this has exactly the same issue. And
> it has other issues too, I think - it breaks down the stats into
> multiple function calls, so ensuring the sanity/correctness of whole
> sets of statistics gets much harder, I think.

I was suggesting an SQL command because this feature is going to need a
lot of options and do a lot of different things, I am afraid, and a
single function might be too complex to manage.

> I'm not sure about the extension idea. Yes, we could have an extension
> providing such functions, but do we have any precedent of making
> pg_upgrade dependent on an external extension? I'd much rather have
> something built-in that just works, especially if we intend to make it
> the default behavior (which I think should be our aim here).

Uh, an extension seems nice to allow people in back branches to install
it, but not for normal usage.

--
Bruce Momjian <bruce(at)momjian(dot)us> https://momjian.us
EDB https://enterprisedb.com

Only you can decide what is important to you.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2023-12-27 15:36:47 Re: trying again to get incremental backup
Previous Message Tom Lane 2023-12-27 15:18:21 Re: pg_upgrade failing for 200+ million Large Objects