From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, jian he <jian(dot)universality(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, alvherre(at)alvh(dot)no-ip(dot)org |
Subject: | Re: Statistics Import and Export |
Date: | 2025-03-06 18:47:34 |
Message-ID: | 714295.1741286854@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Andres Freund <andres(at)anarazel(dot)de> writes:
> And in contrast to analyzing the database in parallel, the pg_dump/restore
> work to restore stats afaict happens single-threaded for each database.
In principle we should be able to do stats dump/restore parallelized
just as we do for data. There are some stumbling blocks in the way
of that:
1. pg_upgrade has made a policy judgement to apply parallelism across
databases not within a database, ie it will launch concurrent dump/
restore tasks in different DBs but not authorize any one of them to
eat multiple CPUs. That needs to be re-thought probably, as I think
that decision dates to before we had useful parallelism in pg_dump and
pg_restore. I wonder if we could just rip out pg_upgrade's support
for DB-level parallelism, which is not terribly pretty anyway, and
simply pass the -j switch straight to pg_dump and pg_restore.
2. pg_restore should already be able to perform stats restores in
parallel (if authorized to use multiple threads), but I'm less clear
on whether that works right now for pg_dump.
3. Also, parallel restore depends critically on the TOC entries'
dependencies being sane, and right now I do not think they are.
I looked at "pg_restore -l -v" output for the regression DB, and it
seems like it's not taking care to ensure that table/MV data is loaded
before the table/MV's stats. (Maybe that accounts for some of the
complaints we've seen about stats getting mangled??)
> I think the stats need to be handled much more like we handle the actual table
> data, which are obviously *not* stored in memory for the whole run of pg_dump.
+1
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Corey Huinker | 2025-03-06 18:47:51 | Re: Statistics Import and Export |
Previous Message | Andres Freund | 2025-03-06 18:23:40 | Re: Statistics Import and Export |