BUG #7902: lazy cleanup of extraneous WAL files can cause out of disk issues

From: jeff(at)pgexperts(dot)com
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #7902: lazy cleanup of extraneous WAL files can cause out of disk issues
Date: 2013-02-22 22:55:24
Message-ID: E1U91WW-0006rq-82@wrigleys.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 7902
Logged by: Jeff Frost
Email address: jeff(at)pgexperts(dot)com
PostgreSQL version: 9.2.3
Operating system: Ubuntu 12.04
Description:

While doing acceptance testing on a new Ubuntu 12.04 PostgreSQL server
running 9.2.3, we set checkpoint_segments = 128,
checkpoint_completion_target = 0.9 and placed pg_xlog on a separate 20G
partition. Also, archive_mode = off on this system.

According to the docs, you would expect the system to attempt to keep the
WAL files down close to 3 * checkpoint_segments + 1. Unfortunately, this
does not appear to be the case because a pgbench run would run the pg_xlog
partition out of space.

The pgbench run script looks like this:

#!/bin/bash

dropdb bench
createdb bench
pgbench -i -s 1000 bench
vacuumdb -a --analyze-only
psql -c "checkpoint"
pgbench -c 64 -j 16 -r -T 600 bench

While the pgbench does cause lots of xlog based checkpoints, they never seem
to remove more than a few files and often pg_xlog grows to more than 20G and
the postgresql service falls over.

After moving pg_xlog to a larger partition, it seems it peaks at about 22G
in size.

A manual checkpoint after the run always brings it back down to ~ 4G in
size.

Interestingly, I was unable to reproduce this with 9.2.3 on our inhouse test
system; however, the inhouse system has much less RAM and CPU resources, so
this may only be an issue on larger systems. The system that exhibits the
issue has 128G of RAM and 16 cores (32 with hyperthreading).

I also tested 9.2.2 on the affected system and it acted the same.

Hope to test 9.1.8 in the next few days.

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message James R Skaggs 2013-02-22 23:41:55 Re: BUG #7853: Incorrect statistics in table with many dead rows.
Previous Message Tom Lane 2013-02-22 18:34:48 Re: new BUG: "postgresql 9.2.3: very long query time"