From: | Hans Buschmann <buschmann(at)nidsa(dot)net> |
---|---|
To: | "tgl(at)sss(dot)pgh(dot)pa(dot)us" <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Assorted improvements in pg_dump |
Date: | 2021-10-22 16:36:27 |
Message-ID: | 7d7eb6128f40401d81b3b7a898b6b4de@W2012-02.nidsa.loc |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello Tom!
I noticed you are improving pg_dump just now.
Some time ago I experimented with a customer database dump in parallel directory mode -F directory -j (2-4)
I noticed it took quite long to complete.
Further investigation showed that in this mode with multiple jobs the tables are processed in decreasing size order, which makes sense to avoid a long tail of a big table in one of the jobs prolonging overall dump time.
Exactly one table took very long, but seemed to be of moderate size.
But the size-determination fails to consider the size of toast tables and this table had a big associated toast-table of bytea column(s).
Even with an analyze at loading time there where no size information of the toast-table in the catalog tables.
I thought of the following alternatives to ameliorate:
1. Using pg_table_size() function in the catalog query
Pos: This reflects the correct size of every relation
Neg: This goes out to disk and may take a huge impact on databases with very many tables
2. Teaching vacuum to set the toast-table size like it sets it on normal tables
3. Have a command/function for occasionly setting the (approximate) size of toast tables
I think with further work under the way (not yet ready), pg_dump can really profit from parallel/not compressing mode, especially considering the huge amount of bytea/blob/string data in many big customer scenarios.
Thoughts?
Hans Buschmann
From | Date | Subject | |
---|---|---|---|
Next Message | Japin Li | 2021-10-22 16:40:42 | Re: [Bug] Logical Replication failing if the DateStyle is different in Publisher & Subscriber |
Previous Message | Zhihong Yu | 2021-10-22 16:01:35 | Re: Multi-Column List Partitioning |