From: | Philip Warner <pjw(at)rhyme(dot)com(dot)au> |
---|---|
To: | brianb-pggeneral(at)edsamail(dot)com, pgsql-general(at)postgresql(dot)org |
Cc: | paul(at)edsamail(dot)com |
Subject: | Re: Backup/dump of huge tables and performance |
Date: | 2000-07-28 06:47:15 |
Message-ID: | 3.0.5.32.20000728164715.026dc640@mail.rhyme.com.au |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
At 02:26 28/07/00 GMT, brianb-pggeneral(at)edsamail(dot)com wrote:
>
>1. pg_dump or COPY take up a lot of I/O resources. That's not surprising
>considering the size of the tables, but the impact on the overall
>production system's performance is not acceptable.
Firstly, you are not using '--inserts', I hope. It is very slow for both
backup & restore. Also, do you know if pg_dump is the IO bottleneck, or the
backend?
Other than that, I'm not sure what can be done about it - the I/O has to be
done some time. Possibly causing pg_dump to (optionally) pause between
records, but that seems like a bad idea, especially with 10s of millions of
records.
Maybe someone who knows more about backend processing can suggest if using
a different kind of retrieval in the COPY command would help.
>2. I don't know how to estimate the size of the resulting dump files from
>the database files.
Not very easy, unless you have very similar data in each tuple...and in the
future pg_dump will support compression, so the size will be even harder to
estimate.
>I would very much prefer to have the backup files in little 10-20MB chunks,
>rather than one humongous dumpfile.
Maybe: pg_dump | split --bytes=10m
> I also want to be able to run the
>backup without shutting down the service that uses the database.
AFAIK, you don't need to shut it down, or are you referring to the
performance problems?
>As noted above, this particular table is no longer being used in
>transactions. I figure I could write a C program to declare a cursor and
>write it out in n-MB or m-thousand row chunks, with rest pauses in between.
You could always do a file-based backup of the database, and restore it
somewhere else, and drop the tables you don't need. Not very elegant, I
realize.
>Any better ideas? Would this be a good feature to incorporate into future
>versions of pg_dump?
I'm not sure what should be changed in pg_dump; delaying between records
seems like a bad idea since it does the dump in a single TX, and besides,
sleeping while a TX is open seems evil to me.
I think making multiple files can be done by 'split', so the real issue is
where the IO problem comes from, and how to reduce it. If pg_dump is the
source of the I/O, then I can try to address it, but if the COPY command is
the problem, that needs to be done by someone else...
----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.C.N. 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/
From | Date | Subject | |
---|---|---|---|
Next Message | Karl Trygve Kalleberg | 2000-07-28 09:54:35 | Async unidirectional replication |
Previous Message | Tom Lane | 2000-07-28 06:35:16 | Re: Connection problem under extreme load. |