Quick Links

Re: Statistics Import and Export: difference in statistics dumped

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc:	Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(at)paquier(dot)xyz>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, jian he <jian(dot)universality(at)gmail(dot)com>
Subject:	Re: Statistics Import and Export: difference in statistics dumped
Date:	2025-03-04 18:15:30
Message-ID:	ca81cbf6e6ea2af838df972801ad4da52640a503.camel@j-davis.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, 2025-03-04 at 10:28 +0530, Ashutosh Bapat wrote:

> >
> > What solution are you suggesting? The only one that comes to mind
> > is
> > moving everything to SECTION_POST_DATA, which is possible, but it
> > seems
> > like a big design change to satisfy a small detail.
>
> We don't have to do that. We can manage it by making statistics of
> index dependent upon the indexes on the table.

The index relstats are already dependent on the index definition. If
you have a simple database like:

CREATE TABLE t(i INT);
INSERT INTO t SELECT generate_series(1,10);
CREATE INDEX t_idx ON t (i);
ANALYZE;

and then you dump it, you get:

------- SECTION_PRE_DATA -------

CREATE TABLE public.t ...

------- SECTION_DATA -----------

COPY public.t (i) FROM stdin;
...
SELECT * FROM pg_catalog.pg_restore_relation_stats(
'version', '180000'::integer,
'relation', 'public.t'::regclass,
'relpages', '1'::integer,
'reltuples', '10'::real,
'relallvisible', '0'::integer
);
...

------- SECTION_POST_DATA ------

CREATE INDEX t_idx ON public.t USING btree (i);
SELECT * FROM pg_catalog.pg_restore_relation_stats(
'version', '180000'::integer,
'relation', 'public.t_idx'::regclass,
'relpages', '2'::integer,
'reltuples', '10'::real,
'relallvisible', '0'::integer
);

(section annotations added for clarity)

There is no problem with the index relstats, because they are already
dependent on the index definition, and will be restored after the
CREATE INDEX.

The issue is when the table's restored relstats are different from what
CREATE INDEX calculates, and then the CREATE INDEX overwrites the
table's just-restored relation stats. The easiest way to see this is
when restoring with --no-data, because CREATE INDEX will see an empty
table and overwrite the table's restored relstats with zeros.

If we view this issue as a dependency problem, then we'd have to make
the *table relstats* depend on the *index definition*. If a table has
any indexes, the relstats would need to go after the last index
definition, effectively moving most relstats to SECTION_POST_DATA. The
table's attribute stats would not be dependent on the index definition
(because CREATE INDEX doesn't touch those), so they could stay in
SECTION_DATA. And if the table doesn't have any indexes, then its
relstats could also stay in SECTION_DATA. But then we have a mess, so
we might as well just put all stats in SECTION_POST_DATA.

But I don't see it as a dependency problem. When I look at the above
SQL, it reads nicely to me and there's no obvious problem with it.

If we want stats to be stable, we need some kind of mode to tell the
server not to apply these kind of helpful optimizations, otherwise the
issue will resurface in some form no matter what we do with pg_dump. We
could invent a new mode, but autovacuum=off seems close enough to me.

Regards,
Jeff Davis

In response to

Re: Statistics Import and Export: difference in statistics dumped at 2025-03-04 04:58:16 from Ashutosh Bapat

Responses

Re: Statistics Import and Export: difference in statistics dumped at 2025-03-05 09:52:56 from Ashutosh Bapat

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Heikki Linnakangas	2025-03-04 18:22:02	Rename some signal and interrupt handling functions for consistency
Previous Message	Andrew Dunstan	2025-03-04 18:13:39	Re: SQL:2023 JSON simplified accessor support