Re: pgstat wait timeout encountered

From: Guillaume Lelarge <guillaume(at)lelarge(dot)info>
To: Mark Steben <mark(dot)steben(at)drivedominion(dot)com>
Cc: John Scalia <jayknowsunix(at)gmail(dot)com>, pgsql-admin <pgsql-admin(at)postgresql(dot)org>
Subject: Re: pgstat wait timeout encountered
Date: 2015-03-10 12:52:57
Message-ID: CAECtzeUbhXbU9MmtEQkZmr6i8QzLiw6_QJ-Q=hB_BxX8OydoAQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

2015-03-10 13:27 GMT+01:00 Mark Steben <mark(dot)steben(at)drivedominion(dot)com>:

> explain analyze doesn't reveal much:
>
> explain analyze SELECT * FROM pg_stat_database WHERE
> datname='prime_production';
> QUERY
> PLAN
>
> ----------------------------------------------------------------------------------------------------------
> Seq Scan on pg_database d (cost=0.00..1.11 rows=1 width=68) (actual
> time=11.201..11.203 rows=1 loops=1)
> Filter: (datname = 'prime_production'::name)
> Rows Removed by Filter: 4
> Total runtime: 11.320 ms
> (4 rows)
>
> explain on another 'offending' query also isn't very helpful
>
> explain analyze select * from pg_stat_bgwriter
> ;
> QUERY
> PLAN
>
> --------------------------------------------------------------------------------------
> Result (cost=0.00..0.04 rows=1 width=0) (actual time=10.771..10.771
> rows=1 loops=1)
> Total runtime: 10.794 ms
>
>
> I've performed a pg_stat_reset this morning. That hasn't been done in
> awhile.
> We'll see if that makes a difference
>
>
That usually doesn't make a difference. "pgstat wait timeout" is the usual
message when your disks are way too slow.

> On Mon, Mar 9, 2015 at 4:47 PM, John Scalia <jayknowsunix(at)gmail(dot)com>
> wrote:
>
>> Have you tried doing an "explain" on your query. Returning that many
>> rows sounds like potential corruption.
>>
>>
>> On 3/9/2015 4:39 PM, Mark Steben wrote:
>>
>> only 5 rows on this table although huge amount of returned tuples on one
>> db:
>>
>> email_web_service=# select * from pg_stat_database;
>> -[ RECORD 1 ]--+------------------------------
>> datid | 1
>> datname | template1
>> numbackends | 0
>> xact_commit | 192026
>> xact_rollback | 0
>> blks_read | 610450
>> blks_hit | 7723329
>> tup_returned | 33702423
>> tup_fetched | 3880133
>> tup_inserted | 7620
>> tup_updated | 291846
>> tup_deleted | 7620
>> conflicts | 0
>> temp_files | 0
>> temp_bytes | 0
>> deadlocks | 0
>> blk_read_time | 0
>> blk_write_time | 0
>> stats_reset | 2013-12-17 11:30:58.890718-05
>> -[ RECORD 2 ]--+------------------------------
>> datid | 12865
>> datname | template0
>> numbackends | 0
>> xact_commit | 48676
>> xact_rollback | 0
>> blks_read | 16904
>> blks_hit | 1321868
>> tup_returned | 15783033
>> tup_fetched | 274467
>> tup_inserted | 0
>> tup_updated | 0
>> tup_deleted | 0
>> conflicts | 0
>> temp_files | 0
>> temp_bytes | 0
>> deadlocks | 0
>> blk_read_time | 0
>> blk_write_time | 0
>> stats_reset | 2014-03-11 06:56:00.139609-04
>> -[ RECORD 3 ]--+------------------------------
>> datid | 12870
>> datname | postgres
>> numbackends | 0
>> xact_commit | 809894
>> xact_rollback | 30
>> blks_read | 959513
>> blks_hit | 59209984
>> tup_returned | 144866984
>> tup_fetched | 37242299
>> tup_inserted | 7726
>> tup_updated | 292637
>> tup_deleted | 7620
>> conflicts | 0
>> temp_files | 0
>> temp_bytes | 0
>> deadlocks | 0
>> blk_read_time | 0
>> blk_write_time | 0
>> stats_reset | 2013-10-17 20:13:08.614236-04
>> -[ RECORD 4 ]--+------------------------------
>> datid | 1998756
>> datname | prime_production
>> numbackends | 198
>> xact_commit | 1904931675
>> xact_rollback | 8354053
>> blks_read | 4085207044
>> blks_hit | 1384556112952
>> tup_returned | 22953800158363
>> tup_fetched | 2347909884098
>> tup_inserted | 2977265442
>> tup_updated | 252247999
>> tup_deleted | 35834382
>> conflicts | 0
>> temp_files | 25119
>> temp_bytes | 1456041951904
>> deadlocks | 43
>> blk_read_time | 0
>> blk_write_time | 0
>> stats_reset | 2015-01-27 11:54:14.737594-05
>> -[ RECORD 5 ]--+------------------------------
>> datid | 3297782
>> datname | email_web_service
>> numbackends | 3
>> xact_commit | 46
>> xact_rollback | 0
>> blks_read | 4
>> blks_hit | 2764
>> tup_returned | 566
>> tup_fetched | 336
>> tup_inserted | 8
>> tup_updated | 24
>> tup_deleted | 8
>> conflicts | 0
>> temp_files | 0
>> temp_bytes | 0
>> deadlocks | 0
>> blk_read_time | 0
>> blk_write_time | 0
>> stats_reset | 2015-03-09 16:36:12.727615-04
>>
>>
>> On Mon, Mar 9, 2015 at 4:21 PM, John Scalia <jayknowsunix(at)gmail(dot)com>
>> wrote:
>>
>>> Not to sound stupid, but two possible issues:
>>>
>>> 1) this is a really big table and
>>> 2) you have no index on the field "datname"
>>>
>>> Possible?
>>> Jay
>>>
>>>
>>> On 3/9/2015 3:50 PM, Mark Steben wrote:
>>>
>>> Good afternoon,
>>>
>>> We have been sporadically receiving a warning: *pgstat wait timeout *
>>> in our production environment. This has been always immediately
>>> followed by
>>> a successful execution of the query.
>>>
>>> EX:
>>>
>>> Mar 6 00:31:29 ardbc01 postgres[42903]: [3-1]
>>> mavmail(at)10(dot)93(dot)156(dot)42:prime_production:0WARNING: pgstat wait timeout
>>> Mar 6 00:31:29 ardbc01 postgres[42903]: [3-2]
>>> mavmail(at)10(dot)93(dot)156(dot)42:prime_production:0STATEMENT: SELECT * FROM
>>> pg_stat_database WHERE datname='prime_production';
>>> Mar 6 00:31:29 ardbc01 postgres[42903]: [4-1]
>>> mavmail(at)10(dot)93(dot)156(dot)42:prime_production:0LOG: duration: 10137.143 ms
>>> statement: SELECT * FROM pg_stat_database WHERE datname='prime_production';
>>>
>>> and
>>>
>>> mavmail(at)10(dot)93(dot)156(dot)42:prime_production:0WARNING: pgstat wait timeout
>>> Mar 6 00:31:46 ardbc01 postgres[42903]: [6-2]
>>> mavmail(at)10(dot)93(dot)156(dot)42:prime_production:0STATEMENT: SELECT
>>> SUM(idx_blks_hit) AS hits, SUM(idx_blks_read) AS reads FROM
>>> pg_statio_user_indexes
>>> Mar 6 00:31:46 ardbc01 postgres[42903]: [7-1]
>>> mavmail(at)10(dot)93(dot)156(dot)42:prime_production:0LOG: duration: 10144.601 ms
>>> statement: SELECT SUM(idx_blks_hit) AS hits, SUM(idx_blks_read) AS reads
>>> FROM pg_statio_user_indexes
>>>
>>> This is not confined to queries of pg stat infrastructure tables. In
>>> another database we can receive the same warning on a vacuum:
>>>
>>> mavmail(at)10(dot)93(dot)156(dot)45:mavmail WARNING: pgstat wait timeout
>>> Mar 6 01:16:44 ardbc01 postgres[66365]: [12-2]
>>> mavmail(at)10(dot)93(dot)156(dot)45:mavmail STATEMENT: VACUUM VERBOSE ANALYZE
>>> update_queues;
>>> Mar 6 01:16:44 ardbc01 postgres[66365]: [13-1]
>>> mavmail(at)10(dot)93(dot)156(dot)45:mavmail INFO: vacuuming "public.update_queues"
>>> Mar 6 01:16:44 ardbc01 postgres[66365]: [13-2]
>>> mavmail(at)10(dot)93(dot)156(dot)45:mavmail STATEMENT: VACUUM VERBOSE ANALYZE
>>> update_queues;
>>> Mar 6 01:16:44 ardbc01 postgres[66365]: [14-1]
>>> mavmail(at)10(dot)93(dot)156(dot)45:mavmail INFO: index "uq_qid" now contains 0 row
>>> versions in 7 pages
>>> Mar 6 01:16:44 ardbc01 postgres[66365]: [14-2]
>>> mavmail(at)10(dot)93(dot)156(dot)45:mavmail DETAIL: 0 index row versions were removed.
>>> Mar 6 01:16:44 ardbc01 postgres[66365]: [14-3] #0114 index pages have
>>> been deleted, 4 are currently reusable.
>>> Mar 6 01:16:44 ardbc01 postgres[66365]: [14-4] #011CPU 0.00s/0.00u sec
>>> elapsed 0.00 sec.
>>> Mar 6 01:16:44 ardbc01 postgres[66365]: [14-5]
>>> mavmail(at)10(dot)93(dot)156(dot)45:mavmail STATEMENT: VACUUM VERBOSE ANALYZE
>>> update_queues;
>>> Mar 6 01:16:44 ardbc01 postgres[66365]: [15-1]
>>> mavmail(at)10(dot)93(dot)156(dot)45:mavmail INFO: "update_queues": found 0 removable,
>>> 0 nonremovable row versions in 0 out of 0 pages
>>> Mar 6 01:16:44 ardbc01 postgres[66365]: [15-2]
>>> mavmail(at)10(dot)93(dot)156(dot)45:mavmail DETAIL: 0 dead row versions cannot be
>>> removed yet.
>>> Mar 6 01:16:44 ardbc01 postgres[66365]: [15-3] #011There were 0 unused
>>> item pointers.
>>> Mar 6 01:16:44 ardbc01 postgres[66365]: [15-4] #0110 pages are entirely
>>> empty.
>>> Mar 6 01:16:44 ardbc01 postgres[66365]: [15-5] #011CPU 0.00s/0.00u sec
>>> elapsed 0.00 sec.
>>> Mar 6 01:16:44 ardbc01 postgres[66365]: [15-6]
>>> mavmail(at)10(dot)93(dot)156(dot)45:mavmail STATEMENT: VACUUM VERBOSE ANALYZE
>>> update_queues;
>>>
>>> I did check older posts on pgadmin - one insight involved the possible
>>> inability of autovacuum to grab the latest stats as the stats collector
>>> might have gotten stuck in a loop and could not update the stats file.
>>> That was from an older PSQL version and we don't employ autovacuum in these
>>> databases.
>>>
>>> Any insights appreciated:
>>> postgres version: 9.2.5
>>>
>>> platform:
>>> Linux ardbc01 2.6.32-358.18.1.el6.x86_64 #1 SMP Wed Aug 28 17:19:38
>>> UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
>>>
>>> Thank you,
>>>
>>>
>>> *Mark Steben*
>>> Database Administrator
>>> @utoRevenue <http://www.autorevenue.com/> | Autobase
>>> <http://www.autobase.net/>
>>> CRM division of Dominion Dealer Solutions
>>> 95D Ashley Ave.
>>> West Springfield, MA 01089
>>> t: 413.327-3045
>>> f: 413.383-9567
>>>
>>> www.fb.com/DominionDealerSolutions
>>> www.twitter.com/DominionDealer
>>> www.drivedominion.com <http://www.autorevenue.com/>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>> --
>> *Mark Steben*
>> Database Administrator
>> @utoRevenue <http://www.autorevenue.com/> | Autobase
>> <http://www.autobase.net/>
>> CRM division of Dominion Dealer Solutions
>> 95D Ashley Ave.
>> West Springfield, MA 01089
>> t: 413.327-3045
>> f: 413.383-9567
>>
>> www.fb.com/DominionDealerSolutions
>> www.twitter.com/DominionDealer
>> www.drivedominion.com <http://www.autorevenue.com/>
>>
>>
>>
>>
>>
>>
>>
>
>
> --
> *Mark Steben*
> Database Administrator
> @utoRevenue <http://www.autorevenue.com/> | Autobase
> <http://www.autobase.net/>
> CRM division of Dominion Dealer Solutions
> 95D Ashley Ave.
> West Springfield, MA 01089
> t: 413.327-3045
> f: 413.383-9567
>
> www.fb.com/DominionDealerSolutions
> www.twitter.com/DominionDealer
> www.drivedominion.com <http://www.autorevenue.com/>
>
> <http://autobasedigital.net/marketing/DD12_sig.jpg>
>
>
>
>

--
Guillaume.
http://blog.guillaume.lelarge.info
http://www.dalibo.com

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Rossi, Maria 2015-03-11 01:02:53 Relocated Postgres
Previous Message John Scalia 2015-03-10 12:49:52 Re: pgstat wait timeout encountered