From: | Gaëtan Allart <gaetan(at)nexylan(dot)com> |
---|---|
To: | Tomas Vondra <tv(at)fuzzy(dot)cz>, Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com> |
Cc: | Robert Treat <rob(at)xzilla(dot)net>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: General performance/load issue |
Date: | 2011-11-25 22:47:32 |
Message-ID: | CAF5D805.DCC9%gaetan@nexylan.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hello Tomas and Cédric,
Right now, the server is not all right. Load is above 30 and queries are
slow like hell.
Here's the complete iotop. Note the 71 MB/s writes (apparently on SELECT
queries).
Total DISK READ: 633.35 K/s | Total DISK WRITE: 71.06 M/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
27352 be/4 postgres 15.64 K/s 86.01 K/s 0.00 % 99.99 % postgres:
database database 176.31.228.6(38816) SELECT
20226 be/4 postgres 7.82 K/s 0.00 B/s 0.00 % 99.99 % postgres:
database database 176.31.228.6(34166) SELECT
26950 be/4 postgres 23.46 K/s 0.00 B/s 0.00 % 82.14 % postgres:
database database 46.105.104.205(40820) SELECT
23160 be/4 postgres 3.91 K/s 0.00 B/s 0.00 % 81.14 % postgres:
database database 46.105.104.205(58091) SELECT
29184 be/4 postgres 7.82 K/s 0.00 B/s 0.00 % 79.17 % postgres:
database database 46.105.104.205(51047) SELECT
27271 be/4 postgres 23.46 K/s 234.58 K/s 0.00 % 77.15 % postgres:
database database 46.105.104.205(42315) SELECT
28224 be/4 postgres 0.00 B/s 0.00 B/s 0.00 % 74.09 % postgres:
database database 46.105.104.205(49871) SELECT
27450 be/4 postgres 23.46 K/s 0.00 B/s 0.00 % 70.12 % postgres:
database database 46.105.104.205(49096) SELECT
29206 be/4 postgres 121.20 K/s 0.00 B/s 0.00 % 62.99 % postgres:
database database 176.31.228.6(39787) SELECT
8202 be/4 postgres 129.02 K/s 0.00 B/s 0.00 % 60.59 % postgres:
database database 91.121.89.14(57291) SELECT
29209 be/4 postgres 3.91 K/s 0.00 B/s 0.00 % 55.75 % postgres:
database database 176.31.228.6(39807) idle
27046 be/4 postgres 3.91 K/s 0.00 B/s 0.00 % 52.35 % postgres:
database database 46.105.104.205(41285) SELECT
29217 be/4 postgres 54.73 K/s 0.00 B/s 0.00 % 50.18 % postgres:
database database 46.105.104.205(51178) SELECT
3249 be/4 root 3.91 K/s 320.59 K/s 0.00 % 40.31 % [kjournald]
28918 be/4 postgres 7.82 K/s 0.00 B/s 0.00 % 34.83 % postgres:
database database 46.105.104.205(50164) SELECT
26856 be/4 postgres 7.82 K/s 234.58 K/s 0.00 % 34.30 % postgres:
database database 46.105.104.205(40589) SELECT
29205 be/4 postgres 7.82 K/s 0.00 B/s 0.00 % 34.17 % postgres:
database database 176.31.228.6(39784) SELECT
4175 be/4 root 0.00 B/s 0.00 B/s 0.00 % 1.84 % [flush-8:0]
28905 be/4 postgres 0.00 B/s 0.00 B/s 0.00 % 0.02 % postgres:
database database 46.105.104.205(50125) SELECT
28919 be/4 postgres 0.00 B/s 0.00 B/s 0.00 % 0.01 % postgres:
database database 46.105.104.205(50167) SELECT
19807 be/4 postgres 3.91 K/s 0.00 B/s 0.00 % 0.00 % postgres:
database database 176.31.228.6(33598) SELECT
28703 be/4 postgres 0.00 B/s 86.01 K/s 0.00 % 0.00 % postgres:
database database 176.31.228.6(39260) SELECT
29183 be/4 postgres 7.82 K/s 0.00 B/s 0.00 % 0.00 % postgres:
database database 46.105.104.205(51044) SELECT
Here is dirty_expire_centisecs :
cat /proc/sys/vm/dirty_expire_centisecs
3000
Bgwriter configuration is default :
#bgwriter_delay = 200ms # 10-10000ms between rounds
#bgwriter_lru_maxpages = 100 # 0-1000 max buffers written/round
#bgwriter_lru_multiplier = 2.0 # 0-10.0 multipler on buffers scanned/round
Is there anything I can provide to help you ?
Thanks again,
Gaëtan
Le 25/11/11 14:12, « Tomas Vondra » <tv(at)fuzzy(dot)cz> a écrit :
>On 25 Listopad 2011, 12:43, Cédric Villemain wrote:
>> Le 25 novembre 2011 11:25, Tomas Vondra <tv(at)fuzzy(dot)cz> a écrit :
>>> On 24 Listopad 2011, 23:19, Cédric Villemain wrote:
>>>>
>>>> It seem you have an issue with your checkpoint syncing time, it is
>>>> fixed in 9.1 and backported in 9.0 here :
>>>> http://projects.2ndquadrant.com/backports
>>>
>>> People generally don't want to apply backports on their own, especially
>>> when it's a production server and when it's unclear it actually fixes
>>> the
>>> issue they have. I'm not sure about that.
>>
>> I agree that most people don't want to do that themselves, but if it
>> happens to be the solution they can proceed or ask someone to do it.
>> People want to see their production system back to a normal situation,
>> here the limited information are not enought to be sure, but the
>> checkpoint sync time are clear: sync time are not correct.
>> It is very probable that compacting the fsync will help, but it is not
>> sure it is required yet.
>
>Yes, the sync times are quite crazy. Especially given the tiny number of
>buffers to be written and the fact that the SSD should handle random I/O
>quite well.
>
>>>> It is possible you have other problems that explains the issue you
>>>> have. An immediate solution before trying a patch is to reduce your
>>>> shared_buffer setting to something very low, like 1GB.
>>>
>>> Well, using low shared_buffers was used especially before 8.3, when the
>>> spread checkpoints were not available. It prevents the I/O overload
>>>when
>>> the database suddenly decides to write all of the dirty buffers. But
>>> he's
>>> on 9.0 (so he already has spread checkpoints).
>>
>> It is a different animal here.
>>
>>>
>>> Plus the number of buffers he's writing is negligible - usually about
>>> 700
>>> buffers (6MB), 3192 buffers (25MB) at most. That surely should not be a
>>> problem for the SSD he's using.
>>
>> See the blog entry from Greg Smith:
>>
>>http://blog.2ndquadrant.com/en/2011/06/backporting-and-checkpoint-tro.htm
>>l
>>
>> And the slides of his talk at pgconf2011:
>>
>>http://www.2ndquadrant.com/static/2quad/media/pdfs/talks/WriteStuff-PGCon
>>2011.pdf
>>
>> I was just pointing that there are known issues in this area, with
>> known solutions.
>
>Thanks for the links, interesting stuff. Still, my impression is that the
>SSD is stressed by something else, and the high fsync values during a
>checkpoint are merely a symptom. So fixing a checkpoint (using the
>backpatch) won't actually fix the issue. But I'm just guessing here.
>
>> Getting more information on vacuum activity, bgwriter activity should
>>help
>> too.
>
>Yes, that'd be nice. Gaëtan, can you post bgwriter-related options from
>postgresql.conf and two snapshosts from pg_stat_bgwriter (say 5 minutes
>apart, collected when the db is slow)? A complete 'iotop -o' output would
>be nice too.
>
>BTW what filesystem are we talking about? What mount options have you
>used?
>
>What about the /proc/sys/vm/dirty_expire_centiseconds?
>
>Tomas
From | Date | Subject | |
---|---|---|---|
Next Message | Cédric Villemain | 2011-11-25 23:17:30 | Re: General performance/load issue |
Previous Message | Adrian Klaver | 2011-11-25 22:42:01 | Re: How to add conversion between LATIN1 and WIN1251 ? |