Re: pg_stat_tmp

From: Tim Kane <tim(dot)kane(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: pgsql-general General <pgsql-general(at)postgresql(dot)org>
Subject: Re: pg_stat_tmp
Date: 2013-12-17 11:25:06
Message-ID: CED5DA42.51EDB%tim.kane@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


Thanks Jeff, Magnus

Thanks for the suggestions.
This morning the same issue occurred, but this time it also complained that
it failed to write to pg_clog (previous days, the only failure was in
writing to pg_stat_tmp)

2013-12-17 07:13:04 GMT DETAIL: Could not write to file "pg_clog/0004" at
offset 212992: No space left on device.
2013-12-17 07:13:05 GMT ERROR: could not access status of transaction 0
2013-12-17 07:13:05 GMT DETAIL: Could not write to file "pg_clog/0004" at
offset 212992: No space left on device.
2013-12-17 07:13:06 GMT ERROR: could not access status of transaction 0
2013-12-17 07:13:06 GMT DETAIL: Could not write to file "pg_clog/0004" at
offset 212992: No space left on device.
2013-12-17 07:13:07 GMT LOG: could not write temporary statistics file
"pg_stat_tmp/pgstat.tmp": No space left on device
2013-12-17 07:13:07 GMT LOG: could not write temporary statistics file
"pg_stat_tmp/pgstat.tmp": No space left on device
2013-12-17 07:13:08 GMT LOG: could not write temporary statistics file
"pg_stat_tmp/pgstat.tmp": No space left on device
2013-12-17 07:13:08 GMT LOG: could not write temporary statistics file
"pg_stat_tmp/pgstat.tmp": No space left on dev2013-12-17 07:25:15 GMT
WARNING: terminating connection because of crash of another server
process
2013-12-17 07:25:15 GMT DETAIL: The postmaster has commanded this server
process to roll back the current transaction and exit, because another
server process exited abnormally and possibly corrupted shared m
emory.
2013-12-17 07:25:15 GMT HINT: In a moment you should be able to reconnect
to the database and repeat your command.
2013-12-17 07:25:16 GMT LOG: all server processes terminated;
reinitializing
2013-12-17 07:25:17 GMT LOG: database system was interrupted; last known up
at 2013-12-17 07:08:22 GMT
2013-12-17 07:25:17 GMT LOG: database system was not properly shut down;
automatic recovery in progress
2013-12-17 07:25:17 GMT LOG: redo starts at 5BC/CF3F9888
2013-12-17 07:25:18 GMT LOG: could not open file
"pg_xlog/00000001000005BC000000D6" (log file 1468, segment 214): No such
file or directory
2013-12-17 07:25:18 GMT LOG: redo done at 5BC/D5FFFEC0
2013-12-17 07:25:18 GMT LOG: last completed transaction was at log time
2013-12-17 07:25:12.989653+00
2013-12-17 07:25:19 GMT LOG: database system is ready to accept connections
2013-12-17 07:25:19 GMT LOG: autovacuum launcher started

It’s never failed to write pg_xlog’s though – or at least, never reported a
failure.

For now, I’ve altered the pg_stat_tmp path and we’ll see what happens
tomorrow – Ideally though, yes.. I’d like to see this happening at runtime
and get a better picture of what’s blowing out here.

Further to this however, I notice that the base directory is consuming more
disk than I would have expected it to (all our relations are stored in
tablespaces assigned to other disks).
Looking for the larger culprits, I note the following files consuming 4.4GB

9.2/main/base/74641/2260957
9.2/main/base/74641/2260957.1
9.2/main/base/74641/2260957.2
9.2/main/base/74641/2260957.3
9.2/main/base/74641/2260957.4

I notice the timestamps are just shy of the time we experience that
pg_stat_tmp issue.

-rw------- 1 postgres postgres 1073741824 Dec 17 06:56 2260957
-rw------- 1 postgres postgres 1073741824 Dec 17 06:56 2260957.1
-rw------- 1 postgres postgres 1073741824 Dec 17 06:56 2260957.2
-rw------- 1 postgres postgres 1073741824 Dec 17 06:56 2260957.3
-rw------- 1 postgres postgres 328466432 Dec 17 06:56 2260957.4

They appear to be orphaned oid’s, though I’m not sure this is definitive:

clone=# select pg_relation_filepath(2260957);
pg_relation_filepath
----------------------

(1 row)

I probably would have thought these to be temporary files – except for that
I have temp_tablespace set elsewhere.

Are these truly orphaned files that postgresql no longer needs (possibly due
to a crash) ? Is there a mechanism to scan the PG_DATA directories to
locate such files?
FYI, we’re running 9.2.4

Thanks again,

Tim

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Date: Monday, 16 December 2013 17:32
To: Tim Kane <tim(dot)kane(at)gmail(dot)com>
Cc: pgsql-general General <pgsql-general(at)postgresql(dot)org>
Subject: Re: [GENERAL] pg_stat_tmp

On Mon, Dec 16, 2013 at 5:57 AM, Tim Kane <tim(dot)kane(at)gmail(dot)com> wrote:
> Hi all,
>
> The past few days I’ve been encountering the following error, followed by a
> full db restart and recovery
>
>
> 2013-12-16 07:12:53 GMT LOG: could not write temporary statistics file
> "pg_stat_tmp/pgstat.tmp": No space left on device

Is that the only thing in the logs? pg_stat_tmp problems should not bring
down your database. But problems with pg_xlog running out of space
certainly can--but they should also be logged.

>
> This occurs at a time of moderate load, during the same set of operations each
> morning.
> Interestingly, when I execute this manually at any other time of date, the
> process completes normally.
>
> I presume that the pg_stat_tmp location is system-wide and likely is not
> impacted by temp_tablespaces
> The root partition, where postgresql is installed does not have a lot of disk
> available (4GB).
>
> My first instinct here is to symlink pg_stat_tmp against another disk with a
> little more room to breathe, however I’m surprised that pgstat.tmp would grow
> to be so large in the first place – possibly there is something else at play
> here.

We don't know how large it is getting! If pg_stat_tmp shares the same
partition as pg_xlog, base (as in the default configuration), and pg_log,
then any of those things could be filling up the partition, and pg_stat_tmp
could just be the canary, not the culprit.

Anyway, you don't need to use a symlink, you could just change
stats_temp_directory to point someplace else.

Your best bet is run "du" or something similar to figure out where your
space is actually going.

Cheers,

Jeff

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Marcin Krawczyk 2013-12-17 12:06:38 Re: [ADMIN] Scheduled Events
Previous Message Stefan Schwarzer 2013-12-17 09:27:19 Re: How to rename the same column name in dozens of tables in Postgres?