From: | Tomas Vondra <tv(at)fuzzy(dot)cz> |
---|---|
To: | pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: BUG #12071: Stat collector went crasy (50MB/s constant writes) |
Date: | 2014-11-26 20:45:12 |
Message-ID: | 54763BD8.8020906@fuzzy.cz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-hackers |
On 26.11.2014 21:07, Maxim Boguk wrote:
>
> On one of my servers stat collector went crasy on idle system.
> It constantly write around 50MB/s on idle database (no
> activity at all).
>
>
> OK there are results of some entry-level gdb kung-fu:
>
> problem point is:
> /*
> * Write the stats file if a new request has arrived
> that is not
> * satisfied by existing file.
> */
> if (last_statwrite < last_statrequest)
> pgstat_write_statsfile(false);
>
> on some loop iteration difference between last_statrequest and
> last_statwrite is around 23000-29000 (e.g. 20-30ms).
>
> Hm may be there some clock drift between CPU cores?
>
>
> What's made this situation interesting that the fact the code never got
> inside
> /*
> * If there is clock skew between backends and the collector, we
> could
> * receive a stats request time that's in the future. If so,
> complain
> * and reset last_statrequest. Resetting ensures that no inquiry
> * message can cause more than one stats file write to occur.
> */
> if (last_statrequest > last_statwrite)
> ...
> block.
That suggests there's no (significant) clock skew between postmaster and
the process requesting the file.
> Watchpoint on the (last_statwrite < last_statrequest) condition
> repetable fire ONLY inside:
> pgstat_recv_inquiry(PgStat_MsgInquiry *msg, int len)
> {
> if (msg->inquiry_time > last_statrequest)
> last_statrequest = msg->inquiry_time;
> }
> function.
OK, that's expected, because the request is sent only if the file is
older than X miliseconds (10ms for autovacuum, 640ms otherwise). So it
would be strange if the condition didn't fire here.
> Now just some musings/theory (I might be dead wrong of course).
> Lets suppose there are no clock skew actually and check execution flow.
>
> Main loop executed:
> if (last_statwrite < last_statrequest)
> pgstat_write_statsfile(false);
> and updated last_statwrite value.
> However, it took some time because of file size.
OK. FWIW, it's updated to GetCurrentTimestamp() before starting to write
the file.
> During that period (while pgstat busy with writes) some backend issued
> PGSTAT_MTYPE_INQUIRY request which will be succesfuly accepted on the
> next code lines at
> len = recv(pgStatSock, (char *) &msg,...
> And as a result the last_statrequest will be updated to the
> msg->inquiry_time value.
> And this value will be definitely > than last_statwrite (because this
> request issued after pgstat worker stared writing new copy of the stat
> file).
How many backends are there, requesting a fresh stats file? I thought
you mentioned the system is rather idle?
A regular backend won't request a fresh copy unless the current file is
older than 640 ms. So that can't cause ~50MB of writes (it'd require 10
writes per second). Maybe there are multiple backends sending such
requests, but I don't see how that could cause a continuous write load
(I'd understand a short spike at the beginning, and then getting
synchronized).
Autovacuum launcher / workers might cause that, because it's using 10 ms
threshold. But that'd require a lot of autovacuum processes running in
parallel, I guess.
I wonder if this could be caused by a long "queue" of such inquiries,
but that should be filtered out by the (last_statwrite <
last_statrequest) condition.
regards
Tomas
From | Date | Subject | |
---|---|---|---|
Next Message | Maxim Boguk | 2014-11-26 21:19:32 | Re: BUG #12071: Stat collector went crasy (50MB/s constant writes) |
Previous Message | Jeff Janes | 2014-11-26 20:29:59 | Re: BUG #12071: Stat collector went crasy (50MB/s constant writes) |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2014-11-26 20:48:42 | Re: bug in json_to_record with arrays |
Previous Message | Peter Eisentraut | 2014-11-26 20:45:06 | Re: [pgsql-packagers] Palle Girgensohn's ICU patch |