Re: Statistics Import and Export commit related issue.

From: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To: jian he <jian(dot)universality(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject: Re: Statistics Import and Export commit related issue.
Date: 2025-02-27 20:50:21
Message-ID: CADkLM=dkRHCnPaOiGctC6YPQ4ztuab8bKZ0c4=rmA75e7tVbjw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 25, 2025 at 1:31 AM jian he <jian(dot)universality(at)gmail(dot)com> wrote:

> hi.
> the thread "Statistics Import and Export" is quite hot.
> so a new thread for some minor issue i found.
>
>
> static int
> _tocEntryRequired(TocEntry *te, teSection curSection, ArchiveHandle *AH)
> {
>
> if (strcmp(te->desc, "STATISTICS DATA") == 0)
> {
> if (!ropt->dumpStatistics)
> return 0;
> else
> res = REQ_STATS;
> }
>
> /* If it's statistics and we don't want statistics, maybe ignore it */
> if (!ropt->dumpStatistics && strcmp(te->desc, "STATISTICS DATA") == 0)
> return 0;
> }
> As you can see, there is some duplicate code there.
> in _tocEntryRequired, we can move `` if (strcmp(te->desc, "STATISTICS
> DATA") == 0)`` after switch (curSection).
> so pg_dump --section option does not influence statistics dump.
> attached is the proposed change.
>

+1, and I think the patch provided is good as-is.

>
>
> in dumpTableData_copy, we have:
> pg_log_info("dumping contents of table \"%s.%s\"",
> tbinfo->dobj.namespace->dobj.name, classname);
> dumpRelationStats add a pg_log_info would be a good thing.
> commit:
> https://git.postgresql.org/cgit/postgresql.git/commit/?id=1fd1bd871012732e3c6c482667d2f2c56f1a9395
> didn't have any pg_log_info.
>

I don't have a strong opinion on this one, but I lean against doing it. I
think table data got a log message because dumping table data is an
operation that takes up a major % of the dump's runtime, whereas statistics
are just some catalog operations. There are log messages for the global
catalog operations (ex. read in all user defined functions, read in all
inheritance relationships), but not per-table catalog operations. If we end
up going to a global export of attribute statistics then I'd definitely
want a log message for that operation.

>
>
> https://www.postgresql.org/docs/devel/app-pgrestore.html
> """
> --jobs=number-of-jobs
> Run the most time-consuming steps of pg_restore — those that load
> data, create indexes, or create constraints — concurrently, using up
> to number-of-jobs concurrent sessions.
> """
> we may need to change the above sentence?
> pg_restore restore STATISTICS DATA also doing it concurrently if
> --jobs is specified.

I also have no strong opinion here, but I think the sentence is fine as is
because it specifies "most time-consuming" and a single catalog row
operation is much smaller than a COPY load, or the sequential scans needed
for index builds and constraint verification.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2025-02-27 21:07:05 Re: moving some code out of explain.c
Previous Message Tom Lane 2025-02-27 20:45:33 Re: SQLFunctionCache and generic plans