From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Better way of dealing with pgstat wait timeout during buildfarm runs? |
Date: | 2015-01-20 16:08:16 |
Message-ID: | 54BE7D70.7050606@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 25.12.2014 22:28, Tomas Vondra wrote:
> On 25.12.2014 21:14, Andres Freund wrote:
>
>> That's indeed odd. Seems to have been lost when the statsfile was
>> split into multiple files. Alvaro, Tomas?
>
> The goal was to keep the logic as close to the original as possible.
> IIRC there were "pgstat wait timeout" issues before, and in most cases
> the conclusion was that it's probably because of overloaded I/O.
>
> But maybe there actually was another bug, and it's entirely possible
> that the split introduced a new one, and that's what we're seeing now.
> The strange thing is that the split happened ~2 years ago, which is
> inconsistent with the sudden increase of this kind of issues. So maybe
> something changed on that particular animal (a failing SD card causing
> I/O stalls, perhaps)?
>
> Anyway, I happen to have a spare Raspberry PI, so I'll try to reproduce
> and analyze the issue locally. But that won't happen until January.
I've tried to reproduce this on my Raspberry PI 'machine' and it's not
very difficult to trigger this. About 7 out of 10 'make check' runs fail
because of 'pgstat wait timeout'.
All the occurences I've seen were right after some sort of VACUUM
(sometimes plain, sometimes ANALYZE or FREEZE), and the I/O at the time
looked something like this:
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
mmcblk0 0.00 75.00 0.00 8.00 0.00 36.00
9.00 5.73 15633.75 0.00 15633.75 125.00 100.00
So pretty terrible (this is a Class 4 SD card, supposedly able to handle
4 MB/s). If hamster had faulty SD card, it might have been much worse, I
guess.
This of course does not prove the absence of a bug - I plan to dig into
this a bit more. Feel free to point out some suspicious scenarios that
might be worth reproducing and analyzing.
--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2015-01-20 16:08:54 | Re: Merging postgresql.conf and postgresql.auto.conf |
Previous Message | Robert Haas | 2015-01-20 15:54:25 | Re: B-Tree support function number 3 (strxfrm() optimization) |