Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I'm not so worried about the amount of RAM needed as whether
> pg_dump's internal algorithms will scale to large numbers of TOC
> entries. Any O(N^2) behavior would be pretty painful, for
> example. No doubt we could fix any such problems, but it might
> take more work than we want to do right now.
I'm afraid pg_dump didn't get very far with this before:
pg_dump: WARNING: out of shared memory
pg_dump: SQL command failed
pg_dump: Error message from server: ERROR: out of shared memory
HINT: You might need to increase max_locks_per_transaction.
pg_dump: The command was: LOCK TABLE public.test2672 IN ACCESS SHARE
MODE
Given how fast it happened, I suspect that it was 2672 tables into
the dump, versus 26% of the way through 5.5 million tables.
A sampling of the vmstat 1 output lines in "baseline" state --
before the dump started:
procs -----------memory---------- ---swap-- -----io---- -system--
-----cpu------
1 0 319804 583656 23372 124473248 0 0 17224 10 1742 18995
9 1 88 2 0
3 1 319804 595840 23368 124458856 0 0 17016 10 2014 22965
9 1 89 1 0
1 0 319804 586912 23376 124469128 0 0 16808 158 1807 19181
8 1 89 2 0
2 0 319804 576304 23368 124479416 0 0 16840 5 1764 19136
8 1 90 1 0
0 1 319804 590480 23364 124459888 0 0 1488 130 3449 13844
2 1 93 3 0
0 1 319804 589476 23364 124460912 0 0 1456 115 3328 11800
2 1 94 4 0
1 0 319804 588468 23364 124461944 0 0 1376 146 3156 11770
2 1 95 2 0
1 1 319804 587836 23364 124465024 0 0 1576 133 3599 14797
3 1 94 3 0
While it was running:
procs -----------memory---------- ---swap-- -----io---- -system--
-----cpu------
2 1 429080 886244 23308 111242464 0 0 25684 38 2920 18847
7 3 85 5 0
2 1 429080 798172 23308 111297976 0 0 40024 26 1342 16967
13 2 82 4 0
2 1 429080 707708 23308 111357600 0 0 42520 34 1588 19148
13 2 81 4 0
0 5 429080 620700 23308 111414144 0 0 40272 73863 1434 18077
12 2 80 6 0
1 5 429080 605616 23308 111425448 0 0 6920 131232 729 5187
3 1 66 31 0
0 6 429080 582852 23316 111442912 0 0 10840 131248 665 4987
3 1 66 30 0
2 4 429080 584976 23308 111433672 0 0 9776 139416 693 7890
4 1 66 29 0
0 5 429080 575752 23308 111436752 0 0 10776 131217 647 6157
3 1 66 30 0
1 3 429080 583768 23308 111420304 0 0 13616 90352 1043 13047
6 1 68 25 0
4 0 429080 578888 23300 111397696 0 0 40000 44 1347 25329
12 2 79 6 0
2 1 429080 582368 23292 111367896 0 0 40320 76 1517 28628
13 2 80 5 0
2 0 429080 584960 23276 111338096 0 0 40240 163 1374 26988
13 2 80 5 0
6 0 429080 576176 23268 111319600 0 0 40328 170 1465 27229
13 2 80 5 0
4 0 429080 583212 23212 111288816 0 0 39568 138 1418 27296
13 2 80 5 0
This box has 16 CPUs, so the jump from 3% user CPU to 13% with an
increase of I/O wait from 3% to 5% suggests that pg_dump was
primarily CPU bound in user code before the crash.
I can leave this database around for a while if there are other
things you would like me to try.
-Kevin