Re: UDP buffer drops / statistics collector

From: Tim Kane <tim(dot)kane(at)gmail(dot)com>
To: PostgreSQL mailing lists <pgsql-general(at)postgresql(dot)org>
Subject: Re: UDP buffer drops / statistics collector
Date: 2017-04-19 18:36:20
Message-ID: CADVWZZK8VSMSUr5gNUwvc8_tOSozPg6Zv=612TZdwfEB0D+Crw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Well, this is frustrating..
The buffer drops are still occurring - so I thought it worth trying use a
ramdisk and set *stats_temp_directory* accordingly.

I've reloaded the instance, and can see that the stats directory is now
being populated in the new location. *Except* - there is one last file (
pgss_query_texts.stat) that continues to be updated in the *old* pg_stat_tmp
path.. Is that supposed to happen?

Fairly similar to this guy (but not quite the same).
https://www.postgresql.org/message-id/D6E71BEFAD7BEB4FBCD8AE74FADB1265011BB40FC749@win-8-eml-ex1.eml.local

I can see the packets arriving and being consumed by the collector.. and,
the collector is indeed updating in the new stats_temp_directory.. just not
for that one file.

It also failed to resolve the buffer drops.. At this point, I'm not sure I
expected it to. They tend to occur semi-regularly (every 8-13 minutes) but
I can't correlate them with any kind of activity (and if I'm honest, it's
possibly starting to drive me a little bit mad).

On Tue, Apr 18, 2017 at 2:53 PM Tim Kane <tim(dot)kane(at)gmail(dot)com> wrote:

> Okay, so I've run an strace on the collector process during a buffer drop
> event.
> I can see evidence of a recvfrom loop pulling in a *maximum* of 142kb.
>
> While I've had already increased rmem_max, it would appear this is not
> being observed by the kernel.
> rmem_default is set to 124kb, which would explain the above read maxing
> out just slightly beyond this (presuming a ring buffer filling up behind
> the read).
>
> I'm going to try increasing rmem_default and see if it has any positive
> effect.. (and then investigate why the kernel doesn't want to consider
> rmem_max)..
>
>
>
>
>
> On Tue, Apr 18, 2017 at 8:05 AM Tim Kane <tim(dot)kane(at)gmail(dot)com> wrote:
>
>> Hi all,
>>
>> I'm seeing sporadic (but frequent) UDP buffer drops on a host that so far
>> I've not been able to resolve.
>>
>> The drops are originating from postgres processes, and from what I know -
>> the only UDP traffic generated by postgres should be consumed by the
>> statistics collector - but for whatever reason, it's failing to read the
>> packets quickly enough.
>>
>> Interestingly, I'm seeing these drops occur even when the system is
>> idle.. but every 15 minutes or so (not consistently enough to isolate any
>> particular activity) we'll see in the order of ~90 packets dropped at a
>> time.
>>
>> I'm running 9.6.2, but the issue was previously occurring on 9.2.4 (on
>> the same hardware)
>>
>>
>> If it's relevant.. there are two instances of postgres running (and
>> consequently, 2 instances of the stats collector process) though 1 of those
>> instances is most definitely idle for most of the day.
>>
>> In an effort to try to resolve the problem, I've increased (x2) the UDP
>> recv buffer sizes on the host - but it seems to have had no effect.
>>
>> cat /proc/sys/net/core/rmem_max
>> 1677216
>>
>> The following parameters are configured
>>
>> track_activities on
>> track_counts on
>> track_functions none
>> track_io_timing off
>>
>>
>> There are approximately 80-100 connections at any given time.
>>
>> It seems that the issue started a few weeks ago, around the time of a
>> reboot on the given host... but it's difficult to know what (if anything)
>> has changed, or why :-/
>>
>>
>> Incidentally... the documentation doesn't seem to have any mention of UDP
>> whatsoever. I'm going to use this as an opportunity to dive into the
>> source - but perhaps it's worth improving the documentation around this?
>>
>> My next step is to try disabling track_activities and track_counts to
>> see if they improve matters any, but I wouldn't expect these to generate
>> enough data to flood the UDP buffers :-/
>>
>> Any ideas?
>>
>>
>>
>>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Luciano Mittmann 2017-04-19 18:40:52 Re: streaming replication and archive_status
Previous Message John R Pierce 2017-04-19 18:31:11 Re: Recover corrupted data