Re: Statistics Import and Export

From: jian he <jian(dot)universality(at)gmail(dot)com>
To: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, alvherre(at)alvh(dot)no-ip(dot)org
Subject: Re: Statistics Import and Export
Date: 2025-01-18 17:00:04
Message-ID: CACJufxF_mytGCQKhycCHp2gwEse5WJvHWnJA-781c1BD8SsRkA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 17, 2025 at 10:20 PM jian he <jian(dot)universality(at)gmail(dot)com> wrote:
>
> hi.
>
> SELECT * FROM pg_catalog.pg_restore_relation_stats(
> 'relation', 'public.tenk1_hundred'::regclass,
> 'version', '180000'::integer,
> 'relpages', '11'::integer,
> 'reltuples', '10000'::real,
> 'relallvisible', '0'::integer
> );
> dump and execute the above query generated a warning
> WARNING: missing lock for relation "tenk1_hundred" (OID 18431,
> relkind i) @ TID (15,34)
>
This seems to be an existing issue.
For pg_restore_relation_stats, we don't have regress tests for index relation.
I am not sure the WARNING is ok.

I found out that the previous mail attached no-cfbot
(refactor_pg_dump_onlyoption.no-cfbot)
refactoring of statistics, data, schema is not fully correct.
This email attached no-cfbot,
I think it is tuitive and correct refactor of handling these three options.

typedef struct _dumpOptions, typedef struct _restoreOptions
we already have three bools (dumpSchema, dumpData, dumpStatistics).
Why do we need three int (no_data, no_schema, no_statistics) fields
for these two structs?
since they represent the same information. (for example, no_data == 1,
means/imply dumpData is false)
(disclaimer, this part I didn't dig deeper).

doc/src/sgml/ref/pg_restore.sgml
<varlistentry>
<term><option>-X</option></term>
<term><option>--statistics-only</option></term>
<listitem>
<para>
Restore only the statistics, not schema (data definitions) or data.
</para>
<para>
(Do not confuse this with the <option>--schema</option> option, which
uses the word <quote>schema</quote> in a different meaning.)
</para>
</listitem>
</varlistentry>
here, we don't need to mention
"(Do not confuse this with the <option>--schema</option> option, which"... part?

--- a/src/bin/pg_dump/pg_restore.c
+++ b/src/bin/pg_dump/pg_restore.c
@@ -108,6 +112,7 @@ main(int argc, char **argv)
{"username", 1, NULL, 'U'},
{"verbose", 0, NULL, 'v'},
{"single-transaction", 0, NULL, '1'},
+ {"statistics-only", no_argument, NULL, 'P'},

Here it should be
+ {"statistics-only", no_argument, NULL, 'X'},
?

If we introduced REQ_STATS, then better checking all the REQ_DATA occurrences,
does REQ_STATS apply to there also?
for example in pg_backup_tar.c and pg_backup_directory.c,
functions: WriteToc, function WriteDataChunks, RestoreArchive.

-----------------------------------------------
I tested locally, dump, restore, directory, custom format is not
working as intended, with v38.
I use the following to test it.
CONN2 is my local connect string.
BIN2 is a local bin directory.
varchar_tbl.dir is directory format dump full output, including data,
schema, statistics.
-----------------------------------------------
${CONN2} -c 'drop table varchar_tbl;'
$BIN2/pg_restore --dbname=src2 --list varchar_tbl.dir
#only schema
$BIN2/pg_restore --dbname=src2 --format=directory --no-statistics
--no-data varchar_tbl.dir
${CONN2} -c 'select attname=$$f1$$ as expect_zero_row from pg_stats
where tablename = $$varchar_tbl$$;'
${CONN2} -c 'select (reltuples < 0 and relpages = 0) as expect_true
from pg_class where relname = $$varchar_tbl$$;'
#only data
$BIN2/pg_restore --dbname=src2 --format=directory --no-statistics
--no-schema varchar_tbl.dir
${CONN2} -c 'select attname=$$f1$$ as expect_zero_row from pg_stats
where tablename = $$varchar_tbl$$;'
${CONN2} -c 'select (reltuples < 0 and relpages = 0) as expect_true
from pg_class where relname = $$varchar_tbl$$;'
#only statistics
$BIN2/pg_restore --dbname=src2 --format=directory --statistics-only
varchar_tbl.dir
${CONN2} -c 'select attname=$$f1$$ as expect_zero_row from pg_stats
where tablename = $$varchar_tbl$$;'
${CONN2} -c 'select reltuples > 0 and relpages > 0 as expect_true from
pg_class where relname = $$varchar_tbl$$;'

Attachment Content-Type Size
v1-0001-misc-minor-refactoring.no-cfbot application/octet-stream 2.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Treat 2025-01-18 17:10:23 Re: Eagerly scan all-visible pages to amortize aggressive vacuum
Previous Message Tomas Vondra 2025-01-18 16:51:08 Re: Confine vacuum skip logic to lazy_scan_skip