Re: Many DataFileRead - IO waits

From: Ben Snaidero <bensnaidero(at)geotab(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: "pgsql-performa(dot)" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Many DataFileRead - IO waits
Date: 2020-03-02 22:39:37
Message-ID: CAEPE5bPn=1n+=QTQYG-+qdquW_Pf9ekycSfDjk+bNKpHAte7sw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Sat, Feb 29, 2020 at 11:22 AM Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:

> On Thu, Feb 27, 2020 at 11:33 AM Ben Snaidero <bensnaidero(at)geotab(dot)com>
> wrote:
>
>
>> I have the following query that was on average running in ~2ms suddenly
>> jump up to on average ~25ms.
>>
>
> What are you averaging over? The plan you show us is slow enough that if
> you were averaging over the last 1000 executions, that one execution could
> skew the entire average just by itself. When individual execution times
> can vary over 4 powers of 10, I don't think averages are a very good way of
> analyzing things.
>
>
>
>> This query is called millions of time per day and there were cases of the
>> query taking 20-30 seconds. Below is the explain analyze of one such
>> example.
>> When seeing this issue, the server was under some CPU pressure but even
>> with that, I would not think it should get as slow as shown below as we are
>> using SSDs and none of the windows disk counters (IOPS, queue length) show
>> any value that would be of concern.
>>
>
> What is the average and expected random read latency on your SSDs? Have
> you benchmarked them (outside of the database system) to see if they are
> performing as expected?
>
>
>> Rows Removed by Filter: 91686
>> Buffers: shared hit=12102 read=13259 written=111
>>
>
> Do the faster executions have fewer rows removed by the filter (and fewer
> buffers read), or are they just faster despite having about the same values?
>
>
>
>> We took a perfview during the issue and below is the call stack from a
>> process running this query, two call paths are shown.
>>
>
> I've never used perfview. But if I try to naively interpret it similar to
> gdb backtrace, it doesn't make much sense to me. InitBufTable is only
> called by "postmaster" while starting the database, how could it be part of
> call paths during regular operations? Are these views of the slow-running
> back end itself, or of some other postgresql process which was idle at the
> time the snapshot was taken?
>
> Cheers,
>
> Jeff
>

Query statistics were averaged over ~3million calls so I don't think a
single outlier would skew the results too much.

The perfview call stack is similar to gdb backtrace. I am 99% sure that
this call path is from the backend running this query as we queried
pg_stat_activity at the time of the perfview and cross-referenced the
PIDs. That said I am going to try building on windows with debug symbols
enabled and see if I can use gdb to debug and confirm.

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message legrand legrand 2020-03-02 23:09:53 Re: Many DataFileRead - IO waits
Previous Message Ben Snaidero 2020-03-02 22:31:02 Re: Many DataFileRead - IO waits