Re: pg_stat_activity.waiting_start

From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Joel Jacobson <joel(at)trustly(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_stat_activity.waiting_start
Date: 2017-01-08 01:04:04
Message-ID: a548f389-918b-1c29-391f-1f4edd017eba@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/28/16 10:26 PM, Robert Haas wrote:
> On Wed, Dec 28, 2016 at 1:06 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com> writes:
>>> On 12/28/16 11:25 AM, Tom Lane wrote:
>>>> The idea of just capturing the wait start for heavyweight locks, and
>>>> not other lock types, still seems superior to any of the alternatives
>>>> that have been suggested ...
>>
>>> Is some kind of alarm a viable option for the others? If setting the
>>> alarm is cheap,
>>
>> Setting an alarm is certain to require a gettimeofday and/or a kernel
>> call. It is far from cheap.
>
> If one were OK with a really-really-rough value for when the wait
> started, you could have a slot for a timestamp in PGPROC that is
> cleared by pgstat_report_wait_end() but not set by
> pgstat_report_wait_start(). Then you could have a background process
> that goes through and sets it for all processes advertising a wait
> event every second, or every 10 seconds, or every half second, or
> whatever you want. Of course this doesn't eliminate the overhead but
> it pushes it into the background which, if there are idle CPUs, is
> almost as good.
>
> I previously proposed something similar to this as a way of profiling
> waits and I believe you weren't very supportive of it, but I still
> think these kinds of ideas have some legs. We can neither take the
> approach that timestamp overhead is irrelevant nor the position that
> timestamps are expensive so let's not have any. Some third way has to
> be found.

I like that idea as a compromise. I've spent a lot of time looking at
systems with no process > 90% CPU and IO < 90% yet individual backends
being slow and wished I had some idea of what the backend was doing that
wasn't being accounted for.

BTW, instead of a flag I'd consider using a counter. That would allow
the background process to notice if backends were going into waits that
ultimately got resolved in between scans of PGPROC. That's obviously not
useful for pg_stat_activity, but it would be meaningful for anything
that was trying to accumulate wait time stats.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Nasby 2017-01-08 01:09:22 Re: pg_stat_activity.waiting_start
Previous Message Jim Nasby 2017-01-08 00:52:39 Re: increasing the default WAL segment size