From: | Bruce Momjian <bruce(at)momjian(dot)us> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Statistics Import and Export |
Date: | 2023-12-27 15:29:01 |
Message-ID: | ZYxCvVTexwVww8Gs@momjian.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Dec 27, 2023 at 01:08:47PM +0100, Tomas Vondra wrote:
> On 12/26/23 20:19, Tom Lane wrote:
> > Bruce Momjian <bruce(at)momjian(dot)us> writes:
> >> I think we need a robust API to handle two cases:
> >
> >> * changes in how we store statistics
> >> * changes in how how data type values are represented in the statistics
> >
> >> We have had such changes in the past, and I think these two issues are
> >> what have prevented import/export of statistics up to this point.
> >> Developing an API that doesn't cleanly handle these will cause long-term
> >> pain.
> >
> > Agreed.
> >
>
> I agree the format is important - we don't want to end up with a format
> that's cumbersome or inconvenient to use. But I don't think the proposed
> format is somewhat bad in those respects - it mostly reflects how we
> store statistics and if I was designing a format for humans, it might
> look a bit differently. But that's not the goal here, IMHO.
>
> I don't quite understand the two cases above. Why should this affect how
> we store statistics? Surely, making the statistics easy to use for the
> optimizer is much more important than occasional export/import.
The two items above were to focus on getting a solution that can easily
handle future statistics storage changes. I figured we would want to
manipulate the data as a table internally so I am confused why we would
export JSON instead of a COPY format. I didn't think we were changing
how we internall store or use the statistics.
> >> In summary, I think we need an SQL-level command for this.
> >
> > I think a SQL command is an actively bad idea. It'll just add development
> > and maintenance overhead that we don't need. When I worked on this topic
> > years ago at Salesforce, I had things set up with simple functions, which
> > pg_dump would invoke by writing more or less
> >
> > SELECT pg_catalog.load_statistics(....);
> >
> > This has a number of advantages, not least of which is that an extension
> > could plausibly add compatible functions to older versions. The trick,
> > as you say, is to figure out what the argument lists ought to be.
> > Unfortunately I recall few details of what I wrote for Salesforce,
> > but I think I had it broken down in a way where there was a separate
> > function call occurring for each pg_statistic "slot", thus roughly
> >
> > load_statistics(table regclass, attname text, stakind int, stavalue ...);
> >
> > I might have had a separate load_statistics_xxx function for each
> > stakind, which would ease the issue of deciding what the datatype
> > of "stavalue" is. As mentioned already, we'd also need some sort of
> > version identifier, and we'd expect the load_statistics() functions
> > to be able to transform the data if the old version used a different
> > representation. I agree with the idea that an explicit representation
> > of the source table attribute's type would be wise, too.
> >
>
> Yeah, this is pretty much what I meant by "functional" interface. But if
> I said maybe the format implemented by the patch is maybe too close to
> how we store the statistics, then this has exactly the same issue. And
> it has other issues too, I think - it breaks down the stats into
> multiple function calls, so ensuring the sanity/correctness of whole
> sets of statistics gets much harder, I think.
I was suggesting an SQL command because this feature is going to need a
lot of options and do a lot of different things, I am afraid, and a
single function might be too complex to manage.
> I'm not sure about the extension idea. Yes, we could have an extension
> providing such functions, but do we have any precedent of making
> pg_upgrade dependent on an external extension? I'd much rather have
> something built-in that just works, especially if we intend to make it
> the default behavior (which I think should be our aim here).
Uh, an extension seems nice to allow people in back branches to install
it, but not for normal usage.
--
Bruce Momjian <bruce(at)momjian(dot)us> https://momjian.us
EDB https://enterprisedb.com
Only you can decide what is important to you.
From | Date | Subject | |
---|---|---|---|
Next Message | Nathan Bossart | 2023-12-27 15:36:47 | Re: trying again to get incremental backup |
Previous Message | Tom Lane | 2023-12-27 15:18:21 | Re: pg_upgrade failing for 200+ million Large Objects |