Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Better way of dealing with pgstat wait timeout during buildfarm runs?
Date: 2014-12-25 21:28:26
Message-ID: 549C817A.6010804@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 25.12.2014 21:14, Andres Freund wrote:
> On 2014-12-25 14:36:42 -0500, Tom Lane wrote:
>
> My guess is that a checkpoint happened at that time. Maybe it'd be a
> good idea to make pg_regress start postgres with log_checkpoints
> enabled? My guess is that we'd find horrendous 'sync' times.
>
> Michael: Could you perhaps turn log_checkpoints on in the config?

Logging timestamps (using log_line_prefux) would be also helpful.

>
>> BTW, I notice that in the current state of pgstat.c, all the logic
>> for keeping track of request arrival times is dead code, because
>> nothing is actually looking at DBWriteRequest.request_time. This
>> makes me think that somebody simplified away some logic we maybe
>> should have kept. But if we're going to leave it like this, we
>> could replace the DBWriteRequest data structure with a simple OID
>> list and save a fair amount of code.
>
> That's indeed odd. Seems to have been lost when the statsfile was
> split into multiple files. Alvaro, Tomas?

The goal was to keep the logic as close to the original as possible.
IIRC there were "pgstat wait timeout" issues before, and in most cases
the conclusion was that it's probably because of overloaded I/O.

But maybe there actually was another bug, and it's entirely possible
that the split introduced a new one, and that's what we're seeing now.
The strange thing is that the split happened ~2 years ago, which is
inconsistent with the sudden increase of this kind of issues. So maybe
something changed on that particular animal (a failing SD card causing
I/O stalls, perhaps)?

Anyway, I happen to have a spare Raspberry PI, so I'll try to reproduce
and analyze the issue locally. But that won't happen until January.

> I wondered for a second whether the split could be responsible
> somehow, but there's reports of that in older backbranches as well:
> http://pgbuildfarm.org/cgi-bin/show_log.pl?nm=mereswine&dt=2014-12-23%2019%3A17%3A41

Tomas

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-12-25 21:40:48 Re: Better way of dealing with pgstat wait timeout during buildfarm runs?
Previous Message Robert Haas 2014-12-25 21:26:35 Re: Proposal: two new role attributes and/or capabilities?