Re: pgstat wait timeout encountered

From: John Scalia <jayknowsunix(at)gmail(dot)com>
To: Mark Steben <mark(dot)steben(at)drivedominion(dot)com>
Cc: pgsql-admin <pgsql-admin(at)postgresql(dot)org>
Subject: Re: pgstat wait timeout encountered
Date: 2015-03-09 20:47:58
Message-ID: 54FE06FE.5000904@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Have you tried doing an "explain" on your query. Returning that many rows sounds like potential corruption.

On 3/9/2015 4:39 PM, Mark Steben wrote:
> only 5 rows on this table although huge amount of returned tuples on one db:
>
> email_web_service=# select * from pg_stat_database;
> -[ RECORD 1 ]--+------------------------------
> datid | 1
> datname | template1
> numbackends | 0
> xact_commit | 192026
> xact_rollback | 0
> blks_read | 610450
> blks_hit | 7723329
> tup_returned | 33702423
> tup_fetched | 3880133
> tup_inserted | 7620
> tup_updated | 291846
> tup_deleted | 7620
> conflicts | 0
> temp_files | 0
> temp_bytes | 0
> deadlocks | 0
> blk_read_time | 0
> blk_write_time | 0
> stats_reset | 2013-12-17 11:30:58.890718-05
> -[ RECORD 2 ]--+------------------------------
> datid | 12865
> datname | template0
> numbackends | 0
> xact_commit | 48676
> xact_rollback | 0
> blks_read | 16904
> blks_hit | 1321868
> tup_returned | 15783033
> tup_fetched | 274467
> tup_inserted | 0
> tup_updated | 0
> tup_deleted | 0
> conflicts | 0
> temp_files | 0
> temp_bytes | 0
> deadlocks | 0
> blk_read_time | 0
> blk_write_time | 0
> stats_reset | 2014-03-11 06:56:00.139609-04
> -[ RECORD 3 ]--+------------------------------
> datid | 12870
> datname | postgres
> numbackends | 0
> xact_commit | 809894
> xact_rollback | 30
> blks_read | 959513
> blks_hit | 59209984
> tup_returned | 144866984
> tup_fetched | 37242299
> tup_inserted | 7726
> tup_updated | 292637
> tup_deleted | 7620
> conflicts | 0
> temp_files | 0
> temp_bytes | 0
> deadlocks | 0
> blk_read_time | 0
> blk_write_time | 0
> stats_reset | 2013-10-17 20:13:08.614236-04
> -[ RECORD 4 ]--+------------------------------
> datid | 1998756
> datname | prime_production
> numbackends | 198
> xact_commit | 1904931675
> xact_rollback | 8354053
> blks_read | 4085207044
> blks_hit | 1384556112952
> tup_returned | 22953800158363
> tup_fetched | 2347909884098
> tup_inserted | 2977265442
> tup_updated | 252247999
> tup_deleted | 35834382
> conflicts | 0
> temp_files | 25119
> temp_bytes | 1456041951904
> deadlocks | 43
> blk_read_time | 0
> blk_write_time | 0
> stats_reset | 2015-01-27 11:54:14.737594-05
> -[ RECORD 5 ]--+------------------------------
> datid | 3297782
> datname | email_web_service
> numbackends | 3
> xact_commit | 46
> xact_rollback | 0
> blks_read | 4
> blks_hit | 2764
> tup_returned | 566
> tup_fetched | 336
> tup_inserted | 8
> tup_updated | 24
> tup_deleted | 8
> conflicts | 0
> temp_files | 0
> temp_bytes | 0
> deadlocks | 0
> blk_read_time | 0
> blk_write_time | 0
> stats_reset | 2015-03-09 16:36:12.727615-04
>
>
> On Mon, Mar 9, 2015 at 4:21 PM, John Scalia <jayknowsunix(at)gmail(dot)com <mailto:jayknowsunix(at)gmail(dot)com>> wrote:
>
> Not to sound stupid, but two possible issues:
>
> 1) this is a really big table and
> 2) you have no index on the field "datname"
>
> Possible?
> Jay
>
>
> On 3/9/2015 3:50 PM, Mark Steben wrote:
>> Good afternoon,
>>
>> We have been sporadically receiving a warning: /pgstat wait timeout /
>> in our production environment. This has been always immediately followed by
>> a successful execution of the query.
>>
>> EX:
>>
>> Mar 6 00:31:29 ardbc01 postgres[42903]: [3-1] mavmail(at)10(dot)93(dot)156(dot)42:prime_production:0WARNING <mailto:mavmail(at)10(dot)93(dot)156(dot)42:prime_production:0WARNING>: pgstat wait timeout
>> Mar 6 00:31:29 ardbc01 postgres[42903]: [3-2] mavmail(at)10(dot)93(dot)156(dot)42:prime_production:0STATEMENT <mailto:mavmail(at)10(dot)93(dot)156(dot)42:prime_production:0STATEMENT>: SELECT * FROM
>> pg_stat_database WHERE datname='prime_production';
>> Mar 6 00:31:29 ardbc01 postgres[42903]: [4-1] mavmail(at)10(dot)93(dot)156(dot)42:prime_production:0LOG <mailto:mavmail(at)10(dot)93(dot)156(dot)42:prime_production:0LOG>: duration: 10137.143 ms
>> statement: SELECT * FROM pg_stat_database WHERE datname='prime_production';
>>
>> and
>>
>> mavmail(at)10(dot)93(dot)156(dot)42:prime_production:0WARNING <mailto:mavmail(at)10(dot)93(dot)156(dot)42:prime_production:0WARNING>: pgstat wait timeout
>> Mar 6 00:31:46 ardbc01 postgres[42903]: [6-2] mavmail(at)10(dot)93(dot)156(dot)42:prime_production:0STATEMENT <mailto:mavmail(at)10(dot)93(dot)156(dot)42:prime_production:0STATEMENT>: SELECT
>> SUM(idx_blks_hit) AS hits, SUM(idx_blks_read) AS reads FROM pg_statio_user_indexes
>> Mar 6 00:31:46 ardbc01 postgres[42903]: [7-1] mavmail(at)10(dot)93(dot)156(dot)42:prime_production:0LOG <mailto:mavmail(at)10(dot)93(dot)156(dot)42:prime_production:0LOG>: duration: 10144.601 ms
>> statement: SELECT SUM(idx_blks_hit) AS hits, SUM(idx_blks_read) AS reads FROM pg_statio_user_indexes
>>
>> This is not confined to queries of pg stat infrastructure tables. In another database we can receive the same warning on a vacuum:
>>
>> mavmail(at)10(dot)93(dot)156(dot)45:mavmail <mailto:mavmail(at)10(dot)93(dot)156(dot)45:mavmail> WARNING: pgstat wait timeout
>> Mar 6 01:16:44 ardbc01 postgres[66365]: [12-2] mavmail(at)10(dot)93(dot)156(dot)45:mavmail <mailto:mavmail(at)10(dot)93(dot)156(dot)45:mavmail> STATEMENT: VACUUM VERBOSE ANALYZE update_queues;
>> Mar 6 01:16:44 ardbc01 postgres[66365]: [13-1] mavmail(at)10(dot)93(dot)156(dot)45:mavmail <mailto:mavmail(at)10(dot)93(dot)156(dot)45:mavmail> INFO: vacuuming "public.update_queues"
>> Mar 6 01:16:44 ardbc01 postgres[66365]: [13-2] mavmail(at)10(dot)93(dot)156(dot)45:mavmail <mailto:mavmail(at)10(dot)93(dot)156(dot)45:mavmail> STATEMENT: VACUUM VERBOSE ANALYZE update_queues;
>> Mar 6 01:16:44 ardbc01 postgres[66365]: [14-1] mavmail(at)10(dot)93(dot)156(dot)45:mavmail <mailto:mavmail(at)10(dot)93(dot)156(dot)45:mavmail> INFO: index "uq_qid" now contains 0 row versions in 7 pages
>> Mar 6 01:16:44 ardbc01 postgres[66365]: [14-2] mavmail(at)10(dot)93(dot)156(dot)45:mavmail <mailto:mavmail(at)10(dot)93(dot)156(dot)45:mavmail> DETAIL: 0 index row versions were removed.
>> Mar 6 01:16:44 ardbc01 postgres[66365]: [14-3] #0114 index pages have been deleted, 4 are currently reusable.
>> Mar 6 01:16:44 ardbc01 postgres[66365]: [14-4] #011CPU 0.00s/0.00u sec elapsed 0.00 sec.
>> Mar 6 01:16:44 ardbc01 postgres[66365]: [14-5] mavmail(at)10(dot)93(dot)156(dot)45:mavmail <mailto:mavmail(at)10(dot)93(dot)156(dot)45:mavmail> STATEMENT: VACUUM VERBOSE ANALYZE update_queues;
>> Mar 6 01:16:44 ardbc01 postgres[66365]: [15-1] mavmail(at)10(dot)93(dot)156(dot)45:mavmail <mailto:mavmail(at)10(dot)93(dot)156(dot)45:mavmail> INFO: "update_queues": found 0 removable, 0 nonremovable
>> row versions in 0 out of 0 pages
>> Mar 6 01:16:44 ardbc01 postgres[66365]: [15-2] mavmail(at)10(dot)93(dot)156(dot)45:mavmail <mailto:mavmail(at)10(dot)93(dot)156(dot)45:mavmail> DETAIL: 0 dead row versions cannot be removed yet.
>> Mar 6 01:16:44 ardbc01 postgres[66365]: [15-3] #011There were 0 unused item pointers.
>> Mar 6 01:16:44 ardbc01 postgres[66365]: [15-4] #0110 pages are entirely empty.
>> Mar 6 01:16:44 ardbc01 postgres[66365]: [15-5] #011CPU 0.00s/0.00u sec elapsed 0.00 sec.
>> Mar 6 01:16:44 ardbc01 postgres[66365]: [15-6] mavmail(at)10(dot)93(dot)156(dot)45:mavmail <mailto:mavmail(at)10(dot)93(dot)156(dot)45:mavmail> STATEMENT: VACUUM VERBOSE ANALYZE update_queues;
>>
>> I did check older posts on pgadmin - one insight involved the possible inability of autovacuum to grab the latest stats as the stats collector might have gotten stuck in a
>> loop and could not update the stats file. That was from an older PSQL version and we don't employ autovacuum in these databases.
>>
>> Any insights appreciated:
>> postgres version: 9.2.5
>>
>> platform:
>> Linux ardbc01 2.6.32-358.18.1.el6.x86_64 #1 SMP Wed Aug 28 17:19:38 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
>>
>> Thank you,
>>
>>
>> *Mark Steben*
>> Database Administrator
>> @utoRevenue <http://www.autorevenue.com/> | Autobase <http://www.autobase.net/>
>> CRM division of Dominion Dealer Solutions
>> 95D Ashley Ave.
>> West Springfield, MA 01089
>> t: 413.327-3045
>> f: 413.383-9567
>>
>> www.fb.com/DominionDealerSolutions <http://www.fb.com/DominionDealerSolutions>
>> www.twitter.com/DominionDealer <http://www.twitter.com/DominionDealer>
>> www.drivedominion.com <http://www.autorevenue.com/>
>>
>>
>>
>>
>>
>
>
>
>
> --
> *Mark Steben*
> Database Administrator
> @utoRevenue <http://www.autorevenue.com/> | Autobase <http://www.autobase.net/>
> CRM division of Dominion Dealer Solutions
> 95D Ashley Ave.
> West Springfield, MA 01089
> t: 413.327-3045
> f: 413.383-9567
>
> www.fb.com/DominionDealerSolutions <http://www.fb.com/DominionDealerSolutions>
> www.twitter.com/DominionDealer <http://www.twitter.com/DominionDealer>
> www.drivedominion.com <http://www.autorevenue.com/>
>
>
>
>
>

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Mark Steben 2015-03-10 12:27:19 Re: pgstat wait timeout encountered
Previous Message Mark Steben 2015-03-09 20:39:48 Re: pgstat wait timeout encountered