From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Sergio Gabriel Rodriguez <sgrodriguez(at)gmail(dot)com> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: problems with large objects dump |
Date: | 2012-10-13 01:31:54 |
Message-ID: | 27767.1350091914@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
I wrote:
> Sergio Gabriel Rodriguez <sgrodriguez(at)gmail(dot)com> writes:
>> I never use oprofile, but for a few hours into the process, I could take
>> this report:
>> 1202449 56.5535 sortDumpableObjects
> Hm. I suspect a lot of that has to do with the large objects; and it's
> really overkill to treat them as full-fledged objects since they never
> have unique dependencies. This wasn't a problem when commit
> c0d5be5d6a736d2ee8141e920bc3de8e001bf6d9 went in, but I think now it
> might be because of the additional constraints added in commit
> a1ef01fe163b304760088e3e30eb22036910a495. I wonder if it's time to try
> to optimize pg_dump's handling of blobs a bit better. But still, any
> such fix probably wouldn't make a huge difference for you. Most of the
> time is going into pushing the blob data around, I think.
For fun, I tried adding 5 million empty blobs to the standard regression
database, and then did a pg_dump. It took a bit under 9 minutes on my
workstation. oprofile showed about 32% of pg_dump's runtime going into
sortDumpableObjects, which might make you think that's worth optimizing
... until you look at the bigger picture system-wide:
samples| %|
------------------
727394 59.4098 kernel
264874 21.6336 postgres
136734 11.1677 /lib64/libc-2.14.90.so
39878 3.2570 pg_dump
37025 3.0240 libpq.so.5.6
17964 1.4672 /usr/bin/wc
354 0.0289 /usr/bin/oprofiled
So actually sortDumpableObjects took only about 1% of the CPU cycles.
And remember this is with empty objects. If we'd been shoving 200GB of
data through the dump, the data pipeline would surely have swamped all
else.
So I think the original assumption that we didn't need to optimize
pg_dump's object management infrastructure for blobs still holds good.
If there's anything that is worth fixing here, it's the number of server
roundtrips being used ...
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2012-10-13 01:34:08 | Re: Do cast affects index usage? |
Previous Message | Anibal David Acosta | 2012-10-13 00:27:42 | Re: Do cast affects index usage? |