Re: corrupted statistics file "pg_stat_tmp/pgstat.stat"

From: "Carl von Clausewitz" <clausewitz45(at)gmail(dot)com>
To: <pgsql-general(at)postgresql(dot)org>
Subject: Re: corrupted statistics file "pg_stat_tmp/pgstat.stat"
Date: 2012-08-15 19:19:04
Message-ID: !&!AAAAAAAAAAAuAAAAAAAAAAWTd+JxKBFOjwezc1xxLcwBAEi3FOTIpilNouXM9QUKMvAAAAA/8FYAABAAAAAdJG6V3mlKQKszGpkA/i1pAQAAAAA=@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

I've made the kernel changes that I wrote in my original e-mail, and I've created some additional logging (both csvlog and syslog), to gather more informations.

/boot/loader.conf:

kern.ipc.semmni="512"
kern.ipc.semmns="1024"
kern.ipc.semume="64"
kern.ipc.semmnu="512"

/etc/sysctl.cong:

kern.ipc.shmall=262144
kern.ipc.shmmax=1073742336
kern.ipc.semmap=256

pgTune made this config changes for me in /usr/local/pgsql/data/postgresql.conf (the server has 4GB RAM)
default_statistics_target = 50 # pgtune wizard 2012-08-15
maintenance_work_mem = 240MB # pgtune wizard 2012-08-15
constraint_exclusion = on # pgtune wizard 2012-08-15
checkpoint_completion_target = 0.9 # pgtune wizard 2012-08-15
effective_cache_size = 2816MB # pgtune wizard 2012-08-15
work_mem = 24MB # pgtune wizard 2012-08-15
wal_buffers = 8MB # pgtune wizard 2012-08-15
checkpoint_segments = 16 # pgtune wizard 2012-08-15
shared_buffers = 960MB # pgtune wizard 2012-08-15
max_connections = 80 # pgtune wizard 2012-08-15

After a day, the file is 412kb large. I've just installed strace, and I try to capture a 2-4 hours work, and check what is going on.

ulimit (& ulimit -f) output is unlimited.

I'll be back (:-)) within few days with the results. Thank you all the informations.

Regards,
Csaba

-----Original Message-----
From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
Sent: Wednesday, August 15, 2012 3:34 PM
To: Carl von Clausewitz
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: [GENERAL] corrupted statistics file "pg_stat_tmp/pgstat.stat"

"Carl von Clausewitz" <clausewitz45(at)gmail(dot)com> writes:
> I’ve restored from TAR backup our databases, and everything looked fine. Without changing any setting in postgresql.conf (or in kernel settings) – only “track_counts=on”, after 2-3 days, I’m receiving huge number (~5-10 PCS in every second) of error messages like that in /var/log/postgresql.log:
> *** Aug 15 06:27:26 eurodb postgres[77652]: [43-1] WARNING: corrupted statistics file "pg_stat_tmp/pgstat.stat"

Huh. The stats collector process ought to rewrite that file fairly often, so this suggests it's consistently failing to rewrite it.

You might take a look at what the file looks like after a day or so of normal operation (eg, how big is it, how often does it get updated) and then compare to what it looks like after the errors start.

Also, try strace'ing the stats collector process for a little while (long enough to capture a stats file rewrite sequence) during normal operation, and then again after the errors start.

I don't want to speculate too much in advance of the data, but I'm wondering about a ulimit setting that limits how much data the stats collector can write during its lifetime (ulimit -f or local equivalent).
That would eventually cause problems for any postgres process, but if you did accidentally have one in place when starting the postmaster, maybe the stats collector would be first to show symptoms.

regards, tom lane

In response to

Browse pgsql-general by date

  From Date Subject
Next Message David Greco 2012-08-15 19:55:20 Set Returning Functions and joins
Previous Message Carl von Clausewitz 2012-08-15 18:54:56 Re: corrupted statistics file "pg_stat_tmp/pgstat.stat"