From: | Steve Crawford <scrawford(at)pinpointresearch(dot)com> |
---|---|
To: | Alvaro Herrera <alvherre(at)commandprompt(dot)com> |
Cc: | Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: pgstat wait timeout |
Date: | 2011-12-28 18:05:49 |
Message-ID: | 4EFB5A7D.70904@pinpointresearch.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 12/28/2011 09:34 AM, Alvaro Herrera wrote:
> Excerpts from Steve Crawford's message of mié dic 28 13:24:37 -0300 2011:
>> On 12/28/2011 05:05 AM, Alvaro Herrera wrote:
>>> Excerpts from Steve Crawford's message of mar dic 27 22:51:06 -0300 2011:
>>>> I have a system (9.0.4 on Ubuntu Server 10.04 LTS x86_64) that is
>>>> currently in test/dev mode. I'm currently seeing the following messages
>>>> occurring every few seconds:
>>>>
>>>> ...
>>>> Dec 27 17:43:22 foo postgres[23693]: [6-1] : WARNING: pgstat wait timeout
>>>> Dec 27 17:43:27 foo postgres[27324]: [71400-1] : WARNING: pgstat wait
>>>> timeout
>>>> Dec 27 17:43:33 foo postgres[23695]: [6-1] : WARNING: pgstat wait timeout
>>>> Dec 27 17:43:54 foo postgres[27324]: [71401-1] : WARNING: pgstat wait
>>>> timeout
>>> Hm, so can you strace the stats collector to see what it's doing? Maybe
>>> grab a backtrace with GDB from it before anything else.
>>>
>>> My guess is 27324 is the autovac launcher and the others are autovac
>>> workers just as they die.
>>>
>> You are correct. 27324 is the launcher and the others are autovac
>> workers. Here's the strace of the stats collector process:
>>
>> getppid() = 27320
>> poll([{fd=8, events=POLLIN|POLLERR}], 1, 2000) = 0 (Timeout)
>> getppid() = 27320
>> poll([{fd=8, events=POLLIN|POLLERR}], 1, 2000) = 0 (Timeout)
>> getppid() = 27320
>> poll([{fd=8, events=POLLIN|POLLERR}], 1, 2000) = 0 (Timeout)
>> ....rinse...lather...repeat...ad nauseum...
> Weird ... even across more "pgstat wait timeout" messages? It's like
> it's not getting the "inquiry" messages that would tell it to write the
> file ... something wrong with the UDP socket perhaps?
>
Bingo!
postgres 27325 postgres 8u *IPv6* 5379428
0t0 UDP localhost:47204->localhost:47204
In working on diagnosing a network timeout issue over an IPv4 to IPv4
VPN I disabled IPv6 via sysctl on this machine and pretty much forgot
about it since we are still IPv4 internally. But PostgreSQL had already
established a (now non-functional) IPv6 local connection. Re-enabling
IPv6, as it was not related to the VPN timeouts, corrected the "pgstat
wait timeout" issue.
Cheers,
Steve
From | Date | Subject | |
---|---|---|---|
Next Message | Dimitri Fontaine | 2011-12-28 18:12:48 | Re: contrib/README |
Previous Message | Peter Eisentraut | 2011-12-28 18:04:09 | age(xid) on hot standby |