From: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, jian he <jian(dot)universality(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, alvherre(at)alvh(dot)no-ip(dot)org |
Subject: | Re: Statistics Import and Export |
Date: | 2025-03-06 20:50:46 |
Message-ID: | Z8oKpqB6OSrHYHQK@nathan |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Mar 06, 2025 at 03:20:16PM -0500, Andres Freund wrote:
> There are many systems with hundreds of databases, removing all parallelism
> for those from pg_upgrade would likely hurt way more than what we can gain
> here.
I just did a quick test on a freshly analyzed database with 1,000 sequences
and 10,000 tables with 1,000 rows and 2 unique constraints apiece.
~/pgdata$ time pg_dump postgres --no-data --binary-upgrade > /dev/null
0.29s user 0.09s system 21% cpu 1.777 total
~/pgdata$ time pg_dump postgres --no-data --no-statistics --binary-upgrade > /dev/null
0.14s user 0.02s system 25% cpu 0.603 total
So about 1.174 seconds goes to statistics. Even if we do all sorts of work
to make dumping statistics really fast, dumping 8 in succession would still
take upwards of 4.8 seconds or more. Even with the current code, dumping 8
in parallel would probably take closer to 2 seconds, and I bet reducing the
number of statistics queries could drive it below 1. Granted, I'm waving
my hands vigorously with those last two estimates.
That being said, I do think in-database parallelism would be useful in some
cases. I frequently hear about problems with huge numbers of large objects
on a cluster with one big database. But that's probably less likely than
the many database case.
--
nathan
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2025-03-06 20:56:46 | Re: Add column name to error description |
Previous Message | Heikki Linnakangas | 2025-03-06 20:49:20 | Re: Refactoring postmaster's code to cleanup after child exit |