From: | Tomas Vondra <tv(at)fuzzy(dot)cz> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Incredibly slow restore times after 9.0>9.2 upgrade |
Date: | 2014-10-30 19:08:36 |
Message-ID: | 54528CB4.403@fuzzy.cz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On 29.10.2014 16:12, jmcdonagh wrote:
> Hi Tomas- thank you for your thoughtful response!
>
>
> Tomas Vondra wrote
>> On 28.10.2014 21:55, jmcdonagh wrote:
>>> Hi, we have a nightly job that restores current production data to
>>> the development databases in a 'warm spare' database so that if the
>>> developers need fresh data, it's ready during the day. When we moved
>>> from 9.0 to 9.2 suddenly the restores began to take from a few hours
>>> to more like 15 hours or so. We're in Amazon EC2, I've tried new EBS
>>> volumes, warmed them up, threw IOPS at them, pretty much all the
>>> standard stuff to get more disk performance.
>>
>> So, if I understand it correctly, you've been restoring into 9.0, then
>> you switched to 9.2 and it's much slower?
>
> Yes- but since the move was done utilizing snapshots so the move
> involves new volumes, but I have created new volumes since then to
> rule out a single bad volume.
My advice would be to do some basic low-level performance tests to rule
this out. Use dd or (better) fio to test basic I/O performance, it's
much easier to spot issues that way.
> Tomas Vondra wrote
>> Is the 9.2 configured equally to 9.0? If you do something like this
>>
>> SELECT name, setting
>> FROM pg_settings
>> WHERE source = 'configuration file';
>>
>> on both versions, what do you get?
>
> I no longer have the 9.0 box up but we do track configuration via
> puppet and git. The only configuration change made for 9.2 is:
>
> -#standard_conforming_strings = off
> +standard_conforming_strings = off
Compared to 9.0, I suppose? Anyway, post the non-default config values
at least for 9.2, please.
> Cause we have an old app that needs this setting on otherwise we'd
> spend a lot of time trying to fix it.
I doubt standard_conforming_strings has anything to do with the issues.
> Tomas Vondra wrote
>>> Here's the thing, the disk isn't saturated. The behavior I'm seeing
>>> seems very odd to me; I'm seeing the source disk which holds the dump
>>> saturated by reads, which is great, but then I just see nothing being
>>> written to the postgres volume. Just nothing happening, then a
>>> small burst. There is no write queue backup on the destination disk
>>> either. if I look at pg_stat_activity I'll see something like:
>>>
>>> COPY salesforce_reconciliation (salesforce_id, email,
>>> advisor_salesforce_id, processed) FROM stdin
>>>
>>> and even for small tables, that seems to take a very long time even
>>> though the destination disk is almost at 0 utilization.
>>
>> So, where's the bottleneck? Clearly, there's one, so is it a CPU, a
>> disk or something else? Or maybe network, because you're using EBS?
>>
>> What do you mean by 'utilization'? How do you measure that?
>
> The bottleneck is I/O somehow. I say somehow, because I see iowait
> averaging about 50% between two CPUs, but there is just no writes to
> the destination EBS volume really happening, just reads from the
> disk where the source dump is located, then bursts of writes to the
> destination volume every so often. It's kind of puzzling. This is
> happening on multiple database servers, in multiple availability
> zones. Driving me bonkers.
>
> What I mean by utilization is util% from iostat -m -x 1.
I find this rather contradictory. At one moment you say the disk isn't
saturated, the next moment you say you're I/O bound.
Also, iowait (as reported e.g. by 'top') is tricky to interpret
correctly, especially on multi-cpu systems (nice intro to the complexity
[1]). It's really difficult to interpret the 50% iowait without more
info about what's happening on the machine.
IMHO, the utilization (as reported by iotop) is much easier to
interpret, because it means '% of time the device was servicing
requests'. It has issues too, because 100% does not mean 'saturated'
(especially on RAID arrays that can service multiple requests in
parallel), but it's better than iowait.
If I had to guess based from your info, I'd bet you're CPU bound, so
there's very little idle time and about 50% of it is spent waiting for
I/O requests (hence the 50% iowait). But in total the amount of I/O is
very small, so %util is ~0.
Please, post a few lines of 'iostat -x -k 1' output. Samples from 'top'
and 'vmstat 1' would be handy too.
regards
Tomas
From | Date | Subject | |
---|---|---|---|
Next Message | Albe Laurenz | 2014-10-31 09:43:56 | Re: pgtune + configurations with 9.3 |
Previous Message | Jerry Sievers | 2014-10-30 16:58:11 | Re: Incredibly slow restore times after 9.0>9.2 upgrade |