From: | jesper(at)krogh(dot)cc |
---|---|
To: | jesper(at)krogh(dot)cc |
Cc: | "Alvaro Herrera" <alvherre(at)2ndquadrant(dot)com>, pgsql-admin(at)postgresql(dot)org |
Subject: | Re: Autovacuum not keeping up. (PG 9.2.9) |
Date: | 2014-08-01 09:11:43 |
Message-ID: | ea5b920abfccddd0707920001041d0af.squirrel@shrek.krogh.cc |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-admin |
>> jesper(at)krogh(dot)cc wrote:
>>> Hi.
>>>
>>> I have a large database with a message queue table, that has high
>>> activity. The database supports 1-300 client connection concurrently,
>>> having transactions open in up to 30 minutes each.
>>>
>>> Recently I am seeing autuvacuum being issued, but it takes
>>> ages to get through the message queue table, with strace showing
>>> waiting
>>> for semop's for 10's to 100's of seconds.
>>
>> Do you have data on how relfrozenxid advances for that table?
>
> Not really, how would you normally pick that out?
> 2014-08-01 10:49:39.171 db=# select relname, age(relfrozenxid),
> relfrozenxid FROM pg_class WHERE relkind = 'r' and relname = 'job';
> relname | age | relfrozenxid
> ---------+-----------+--------------
> job | 111893622 | 796259097
> (1 row)
>
> Time: 1.913 ms
>
> This shouldn't qualify for a freeze vacuum, should it?
>
>> Vacuuming needs to grab a "cleanup lock" on each page it's going to
>> vacuum, which is a special kind of lock that requires that no other
>> process is even looking at the page at that moment (we call this "to
>> have the page pinned"), which is even weaker than having a shared lock
>> on the page. If traffic to some pages is high, it might be difficult
>> for vacuum to acquire this.
>
> Based on strace output this looks excactly like whats going on, there are
> some activity, then it pauses for some time then continues.
Very "scientifically" i have captured this strace output of the autovacuum
pid. This is a 10 minute trace (600s), total waiting time for
semop(2850903 in those minutes are not less than 485s, (last semop not
counted in).
jk(at)srv01:~$ sudo timelimit -t 600 -T 10 strace -T -p 36839 2>&1 | grep -v
0.000
Process 36839 attached - interrupt to quit
semop(2850903, {{11, -1, 0}}, 1) = 0 <14.403021>
select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) <0.001100>
write(367, "\\D\1\0\30p\325\37\1\0\0\0t\6\200\6\360\37\4 \0\0\0\0\340\237
\0\200\206 \0"..., 8192) = 8192 <0.002371>
semop(2850903, {{11, -1, 0}}, 1) = 0 <38.871837>
semop(2850903, {{11, -1, 0}}, 1) = 0 <17.111822>
semop(2850903, {{11, -1, 0}}, 1) = 0 <22.438373>
select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) <0.001095>
semop(2850903, {{11, -1, 0}}, 1) = 0 <32.521132>
semop(2261061, {{8, 1, 0}}, 1) = 0 <0.001601>
semop(2850903, {{11, -1, 0}}, 1) = 0 <10.740716>
semop(1179684, {{9, 1, 0}}, 1) = 0 <0.001900>
select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) <0.001106>
semop(2850903, {{11, -1, 0}}, 1) = 0 <26.610242>
semop(2850903, {{11, -1, 0}}, 1) = 0 <21.911487>
select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) <0.001060>
semop(2850903, {{11, -1, 0}}, 1) = 0 <0.001694>
semop(2850903, {{11, -1, 0}}, 1) = 0 <128.996370>
select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) <0.001390>
semop(2850903, {{11, -1, 0}}, 1) = 0 <20.553280>
read(367, "\\D\1\0H\2107\357\1\0\0\0|\3`\22\360\37\4 \0\0\0\0\340\237
\0\320\237 \0"..., 8192) = 8192 <0.001670>
select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) <0.001357>
semop(2850903, {{11, -1, 0}}, 1) = 0 <0.001046>
semop(2850903, {{11, -1, 0}}, 1) = 0 <0.001352>
semop(2850903, {{11, -1, 0}}, 1) = 0 <44.311165>
semop(2850903, {{11, -1, 0}}, 1) = 0 <79.246222>
read(367, "]D\1\0\10\"\3-\1\0\0\0|\3`\22\360\37\4 \0\0\0\0\340\237
\0\320\237 \0"..., 8192) = 8192 <0.001124>
select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) <0.001093>
select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) <0.001098>
semop(2850903, {{11, -1, 0}}, 1) = 0 <8.440753>
semop(2850903, {{11, -1, 0}}, 1) = 0 <0.001390>
semop(2850903, {{11, -1, 0}}, 1) = 0 <27.041010>
semop(2850903, {{11, -1, 0}}, 1) = 0 <0.001679>
semop(2850903, {{11, -1, 0}}, 1) = 0 <0.004053>
semop(2850903, {{11, -1, 0}}, 1) = 0 <9.158556>
semop(2850903, {{11, -1, 0}}, 1) = 0 <0.001015>
select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) <0.001113>
write(367, "]D\1\0\360#\177x\1\0\0\0\344\4\300\f\360\37\4 \0\0\0\0\340\237
\0\300\214 \0"..., 8192) = 8192 <0.001474>
semop(2850903, {{11, -1, 0}}, 1) = 0 <0.002450>
semop(655380, {{7, 1, 0}}, 1) = 0 <0.002245>
read(367, "]D\1\0\270\311\247\221\1\0\0\0D\5(at)\v\360\37\4 \0\0\0\0\340\237
\0(at)\213 \0"..., 8192) = 8192 <0.001339>
semop(2850903, {{11, -1, 0}}, 1) = 0 <0.002029>
select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) <0.001078>
select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) <0.001079>
semop(2850903, {{11, -1, 0}}, 1timelimit: sending warning signal 15
<unfinished ...>
Process 36839 detached
jk(at)srv01:~$
Is there any clue in that the semaphorenumber it waits on is excactly the
same all the time? (the <> to the right are timings in seconds.
--
Jesper
From | Date | Subject | |
---|---|---|---|
Next Message | Rural Hunter | 2014-08-01 12:37:18 | [Help needed] Update query hangs forever on high load |
Previous Message | jesper | 2014-08-01 08:59:36 | Re: Autovacuum not keeping up. (PG 9.2.9) |