Re: Backup/dump of huge tables and performance

From: brianb-pggeneral(at)edsamail(dot)com
To: Philip Warner <pjw(at)rhyme(dot)com(dot)au>
Cc: pgsql-general(at)postgresql(dot)org, paul(at)edsamail(dot)com
Subject: Re: Backup/dump of huge tables and performance
Date: 2000-07-28 10:11:06
Message-ID: 20000728101106.14925.qmail@mail01.edsamail.com.ph
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


Hi Philip,

Philip Warner writes:
> At 02:26 28/07/00 GMT, brianb-pggeneral(at)edsamail(dot)com wrote:
> >1. pg_dump or COPY take up a lot of I/O resources. That's not surprising
> >considering the size of the tables, but the impact on the overall
> >production system's performance is not acceptable.
>
> Firstly, you are not using '--inserts', I hope. It is very slow for both
> backup & restore. Also, do you know if pg_dump is the IO bottleneck, or the
> backend?

Nope, I don't use --inserts. The bottleneck seems to be at the backend,
since it also happens with COPY.

> Other than that, I'm not sure what can be done about it - the I/O has to be
> done some time. Possibly causing pg_dump to (optionally) pause between
> records, but that seems like a bad idea, especially with 10s of millions of
> records.

> You could always do a file-based backup of the database, and restore it
> somewhere else, and drop the tables you don't need. Not very elegant, I
> realize.

What file-based backup are you referring to? From reading previous posts,
I'd inferred that I couldn't just copy my /var/lib/pgsql over to another
server with Postgres and start postmaster.

> >Any better ideas? Would this be a good feature to incorporate into future
> >versions of pg_dump?
>
> I'm not sure what should be changed in pg_dump; delaying between records
> seems like a bad idea since it does the dump in a single TX, and besides,
> sleeping while a TX is open seems evil to me.

I suppose it's a special case, then. I just need to move the table, and I'd
rather have the I/O load spread over a longer period of time to soften the
impact on the rest of the system.

> I think making multiple files can be done by 'split', so the real issue is
> where the IO problem comes from, and how to reduce it. If pg_dump is the
> source of the I/O, then I can try to address it, but if the COPY command is
> the problem, that needs to be done by someone else...

And thanks for suggesting "split". I had forgotten it existed and was about
to write it from scratch.

Brian
--
Brian Baquiran <brianb(at)edsamail(dot)com>
http://www.baquiran.com/ AIM: bbaquiran
Work: +63(2)7182222 Home: +63(2) 9227123

I'm smarter than average. Therefore, average, to me, seems kind of stupid.
People weren't purposely being stupid. It just came naturally.
-- Bruce "Tog" Toganazzini

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message brad 2000-07-28 10:23:20 Re: Re: 4 billion record limit?
Previous Message Marc Gehling 2000-07-28 10:03:32 Re: pg_dump error