Maximum number of WAL files in the pg_xlog directory

From: Guillaume Lelarge <guillaume(at)lelarge(dot)info>
To: PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Maximum number of WAL files in the pg_xlog directory
Date: 2014-08-08 07:08:50
Message-ID: CAECtzeUeGhwCiNTishH=+kxhiepJsHu7EO0J6-LEVO-ek5oPkg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

As part of our monitoring work for our customers, we stumbled upon an issue
with our customers' servers who have a wal_keep_segments setting higher
than 0.

We have a monitoring script that checks the number of WAL files in the
pg_xlog directory, according to the setting of three parameters
(checkpoint_completion_target, checkpoint_segments, and wal_keep_segments).
We usually add a percentage to the usual formula:

greatest(
(2 + checkpoint_completion_target) * checkpoint_segments + 1,
checkpoint_segments + wal_keep_segments + 1
)

And we have lots of alerts from the script for customers who set their
wal_keep_segments setting higher than 0.

So we started to question this sentence of the documentation:

There will always be at least one WAL segment file, and will normally not
be more than (2 + checkpoint_completion_target) * checkpoint_segments + 1
or checkpoint_segments + wal_keep_segments + 1 files.

(http://www.postgresql.org/docs/9.3/static/wal-configuration.html)

While doing some tests, it appears it would be more something like:

wal_keep_segments + (2 + checkpoint_completion_target) *
checkpoint_segments + 1

But after reading the source code (src/backend/access/transam/xlog.c), the
right formula seems to be:

wal_keep_segments + 2 * checkpoint_segments + 1

Here is how we went to this formula...

CreateCheckPoint(..) is responsible, among other things, for deleting and
recycling old WAL files. From src/backend/access/transam/xlog.c, master
branch, line 8363:

/*
* Delete old log files (those no longer needed even for previous
* checkpoint or the standbys in XLOG streaming).
*/
if (_logSegNo)
{
KeepLogSeg(recptr, &_logSegNo);
_logSegNo--;
RemoveOldXlogFiles(_logSegNo, recptr);
}

KeepLogSeg(...) function takes care of wal_keep_segments. From
src/backend/access/transam/xlog.c, master branch, line 8792:

/* compute limit for wal_keep_segments first */
if (wal_keep_segments > 0)
{
/* avoid underflow, don't go below 1 */
if (segno <= wal_keep_segments)
segno = 1;
else
segno = segno - wal_keep_segments;
}

IOW, the segment number (segno) is decremented according to the setting of
wal_keep_segments. segno is then sent back to CreateCheckPoint(...) via
_logSegNo. The RemoveOldXlogFiles() gets this segment number so that it can
remove or recycle all files before this segment number. This function gets
the number of WAL files to recycle with the XLOGfileslop constant, which is
defined as:

/*
* XLOGfileslop is the maximum number of preallocated future XLOG segments.
* When we are done with an old XLOG segment file, we will recycle it as a
* future XLOG segment as long as there aren't already XLOGfileslop future
* segments; else we'll delete it. This could be made a separate GUC
* variable, but at present I think it's sufficient to hardwire it as
* 2*CheckPointSegments+1. Under normal conditions, a checkpoint will free
* no more than 2*CheckPointSegments log segments, and we want to recycle
all
* of them; the +1 allows boundary cases to happen without wasting a
* delete/create-segment cycle.
*/
#define XLOGfileslop (2*CheckPointSegments + 1)

(in src/backend/access/transam/xlog.c, master branch, line 100)

IOW, PostgreSQL will keep wal_keep_segments WAL files before the current
WAL file, and then there may be 2*CheckPointSegments + 1 recycled ones.
Hence the formula:

wal_keep_segments + 2 * checkpoint_segments + 1

And this is what we usually find in our customers' servers. We may find
more WAL files, depending on the write activity of the cluster, but in
average, we get this number of WAL files.

AFAICT, the documentation is wrong about the usual number of WAL files in
the pg_xlog directory. But I may be wrong, in which case, the documentation
isn't clear enough for me, and should be fixed so that others can't
misinterpret it like I may have done.

Any comments? did I miss something, or should we fix the documentation?

Thanks.

--
Guillaume.
http://blog.guillaume.lelarge.info
http://www.dalibo.com

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2014-08-08 08:10:17 Re: pg_receivexlog add synchronous mode
Previous Message Benedikt Grundmann 2014-08-08 06:57:23 Re: Proposal: Incremental Backup