Re: Statistics Import and Export

From: jian he <jian(dot)universality(at)gmail(dot)com>
To: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, alvherre(at)alvh(dot)no-ip(dot)org
Subject: Re: Statistics Import and Export
Date: 2025-01-20 08:21:35
Message-ID: CACJufxGuRPfd70NwogkRE26cZcKGrR9gjG4OiH18tu=BnyezHA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 16, 2025 at 4:56 AM Corey Huinker <corey(dot)huinker(at)gmail(dot)com> wrote:
>>
>>
>>
>> I do like the idea of a "Statistics for ..." prefix, and I think it's
>> doable.
>
>
> And that's now implemented.
>
>
>> The caller needs some knowledge about that anyway, to correctly output
>> the statistics dump when the schema is not requested. Tests should
>> cover those cases, too.
>
>
> Tests for pg_dump --no-statistics and pg_dump --schema-only were added. Rebased to master as of today.
>
> I'm not completely happy with this patch, as I had to comment out one check in pg_backup_archiver that seemed necessary, but perhaps another set of eyes will set me straight.
>
> Attached is just the pg_dump stuff, and only for relation/attribute stats. The extended stats and vacuumdb work will be in their own threads going forward.

hi
The current v38 implementation allows statistics to be placed in
either SECTION_DATA or SECTION_POST_DATA.
IMHO, moving all statistics to the SECTION_POST_DATA section would
simplify things.
Attached is a patch that implements this change, (based on your patch,
obviously)

Reasoning making statistics in SECTION_POST_DATA are:
* statistics in multi sections will make the --section
handle statistics more harder for pg_dump and pg_restore.

* current doc in --schema-only says
"It is similar to, but for historical reasons not identical to, specifying
--section=pre-data --section=post-data.".
section span two sections making this sentence not less accurate.
Also, if we want to use --section option to dump all the statistics, we
need to use --data and --post-data together to get all the statistics.

* generally, --post-data section takes less time then --data section,
so putting it in SECTION_POST_DATA won't "cost" us that much.

* repairDependencyLoop, repairMatViewBoundaryMultiLoop is quite hard
to comprehend.
make statistics in SECTION_POST_DATA can make use not to think about
these changes.
also seems there is no materialized view statistics dump and restore
tests in src/bin/pg_dump/t/002_pg_dump.pl

* There are many REQ_DATA or REQ_SCHEMA occurrences, for each
occurrence, we may need to consider REQ_STATS.
make the statistics in SECTION_POST_DATA, then we don't need REQ_STATS.

what do you think?

This issue [1] seems not yet resolved.
[1] https://postgr.es/m/Z22kX5x2IhNb8kHE@momjian.us
even if pg_upgrade has options --with-statistics and --with-statistics.
we still need a sentence to mention which option is default?

Attachment Content-Type Size
v38-0001-make-statistics-dumped-at-SECTION_POST_DATA.no-cfbot application/octet-stream 51.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2025-01-20 08:26:11 Re: Re: proposal: schema variables
Previous Message Nisha Moond 2025-01-20 08:03:54 Re: Introduce XID age and inactive timeout based replication slot invalidation