Re: WAL archive space planning?

From: Ray Stell <stellr(at)vt(dot)edu>
To: Steven Chang <stevenchang1213(at)gmail(dot)com>, Keith <keith(at)keithf4(dot)com>
Cc: pgsql-admin <pgsql-admin(at)postgresql(dot)org>
Subject: Re: WAL archive space planning?
Date: 2017-03-14 13:09:56
Message-ID: 07a5e30c-ddd1-9e7d-69a5-f95a8e6efebf@vt.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

On 3/13/17 11:53 PM, Steven Chang wrote:

> Dears,
>
> Just read a posgres monitoring moudule named check-postres,
> which is a ready enhance package on Debian Linux Distribution.
> https://bucardo.org/check_postgres/check_postgres.pl.html
> here is its readme about wal_files
> Maybe you can also use this module to help plan your space usage
> of WAL files.
>
> wal_files
> ("symlink: check_postgres_wal_files") Checks how many WAL files
> exist in the
> pg_xlog directory, which is found off of your data_directory,
> sometimes as a
> symlink to another physical disk for performance reasons. This
> action must be
> run as a superuser, in order to access the contents of the
> pg_xlog directory.
> The minimum version to use this action is Postgres 8.1. The
> --warning and
> --critical options are simply the number of files in the
> pg_xlog directory. What
> number to set this to will vary, but a general guideline is to
> put a number
> slightly higher than what is normally there, to catch problems
> early.
>
> Normally, WAL files are closed and then re-used, but a
> long-running open
> transaction, or a faulty archive_command script, may cause
> Postgres to create
> too many files. Ultimately, this will cause the disk they are
> on to run out of
> space, at which point Postgres will shut down.
>
>
> 2017-02-22 3:45 GMT+08:00 Keith <keith(at)keithf4(dot)com
> <mailto:keith(at)keithf4(dot)com>>:
>
>
>
> On Tue, Feb 21, 2017 at 2:33 PM, Ray Stell <stellr(at)vt(dot)edu
> <mailto:stellr(at)vt(dot)edu>> wrote:
>
> On 2/21/17 12:26 PM, Keith wrote:
>
>>
>>
>> On Tue, Feb 21, 2017 at 12:22 PM, Ray Stell <stellr(at)vt(dot)edu
>> <mailto:stellr(at)vt(dot)edu>> wrote:
>>
>> On 2/21/17 12:09 AM, Steven Chang wrote:
>>
>>> check this, his 2nd part introduces Stream Replication
>>> Implementation and tell you px_log retention and wal
>>> archive related parameters.
>>> https://www.itenlight.com/blog/2016/05/19/PostgreSQL+HA+with+pgpool-II+-+Part+2
>>> <https://www.itenlight.com/blog/2016/05/19/PostgreSQL+HA+with+pgpool-II+-+Part+2>
>>>
>>> 2017-02-18 2:30 GMT+08:00 Ray Stell <stellr(at)vt(dot)edu
>>> <mailto:stellr(at)vt(dot)edu>>:
>>>
>>> I was "planning" to turn on WAL archiving on a
>>> postgresql 9.4.11 server that currently is running
>>> with "wal_level=hot_standby" and streaming to a
>>> standby. I thought there would be a relationship
>>> between the rate of pg_xlog files and archive
>>> generation. When I turned up the
>>> archive_command/mode I found the scale of the
>>> archive target was wrong as I had based it on the
>>> pg_xlog file creation rate.
>>>
>>> When I turned on the archive command for a few
>>> minutes, pg_xlog dir contained these files for the
>>> time period:
>>>
>>> -rw------- 1 postgres postgres 16777216 Feb 17 06:47
>>> 00000001000023AC0000007F
>>> -rw------- 1 postgres postgres 16777216 Feb 17 06:45
>>> 00000001000023AC0000007E
>>>
>>> The archive command wrote 126, 16MB files:
>>>
>>> -rw------- 1 postgres postgres 16777216 Feb 17 06:47
>>> 000000010000237700000056
>>> -rw------- 1 postgres postgres 16777216 Feb 17 06:47
>>> 000000010000237700000055
>>> -rw------- 1 postgres postgres 16777216 Feb 17 06:47
>>> 000000010000237700000054
>>> -rw------- 1 postgres postgres 16777216 Feb 17 06:47
>>> 000000010000237700000053
>>> ...
>>>
>>> -rw------- 1 postgres postgres 16777216 Feb 17 06:45
>>> 0000000100002376000000DC
>>> -rw------- 1 postgres postgres 16777216 Feb 17 06:45
>>> 0000000100002376000000DB
>>> -rw------- 1 postgres postgres 16777216 Feb 17 06:45
>>> 0000000100002376000000DA
>>> -rw------- 1 postgres postgres 16777216 Feb 17 06:45
>>> 0000000100002376000000D9
>>>
>>> On servers that are not nearly as busy, I observe a
>>> one-to-one relationship between these files/rates.
>>>
>>> Is there a good WAL archive space planning guide?
>>>
>>> Is the way to collect planning data for this to turn
>>> on wal_debug?
>>>
>>> TIA!
>>>
>>>
>>>
>>>
>>> --
>>> Sent via pgsql-admin mailing list
>>> (pgsql-admin(at)postgresql(dot)org
>>> <mailto:pgsql-admin(at)postgresql(dot)org>)
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-admin
>>> <http://www.postgresql.org/mailpref/pgsql-admin>
>>>
>>>
>> I'm interested in why I might observed a dramatic
>> difference between pg_xlog files and the archive target
>> files. Other dbs I see have them pretty much one-to-one,
>> but not in this case here. What might cause the large
>> variation? I got only a few pg_xlogs files each minute
>> and 42 archive files/min.
>>
>>
>>
>> Ray,
>>
>> Did you see my previous response about the archive_timeout
>> setting?
>
> Lost your post somehow, but I see it in the list archive.
> Thanks, Keith.
>
> There was a 30 minute timeout set, but I would not think that
> would increase the archive file generation as observed, does it?
>
> " When this parameter is greater than zero, the server will
> switch to a new segment file whenever this many seconds have
> elapsed since the last segment file switch, and there has been
> any database activity, including a single checkpoint.
> (Increasing checkpoint_timeout will reduce unnecessary
> checkpoints on an idle system.) "
>
> Your post seems to indicate having a non-zero value might
> increase the number of archive files, is that so? Maybe it is
> in milliseconds which might make sense with the observed
> archive rate, but the doc says seconds and it was set to
> 1800. I can set it to 0 and see if it changes things, but I'm
> a little confused.
>
>
>
>
> If it was set to 30 minutes, then no, it wouldn't explain what you
> saw. But we didn't know what it was set to, so that's why I asked.
> Make sure you're checking the value actually set in the database,
> and not just what's in postgresql.conf, since it could've changed
> but never been reloaded to put in place (type "show
> archive_timeout" in psql).
>
> If that's not it, not sure what may be causing what you're seeing.
>
> Keith
>
>
Thanks, Steven,

An interesting statement:

"long-running open transaction, or a faulty archive_command script, may
cause Postgres to create too many files. Ultimately, this will cause the
disk they are on to run out of space, at which point Postgres will shut
down."

The statement, "long-running open transaction...may cause Postgres to
create too many files" makes it sound like there could be the case where
there is not a 1-to-1 relationship between pg_xlog and the archive.
I've never seen that before and didn't think that was the case. Maybe
I'm reading too much into the statement.

I have to suspect that the "faulty" nature of the archive_command was
that it was just asking too much of the I/O subsystem and was just
getting way behind. Shockingly way behind. I didn't expect it. It is
a very busy system will many long running transactions. If I ever get
back to that problem, my first pass will be to reduce the load request.
I suppose I can do some bonnie tests to simulate the archive writing. I
was doing some multiplexing of the files in the archive shell. Let's
see what writing them once locally does first. If it can't do that and
keep up, I'm going to need a "bigger boat." It's a production system
so I only get a few cracks at it a year. I'll probably wait for the
next patch release. Going back to the last backup is within the SLA,
but I don't like not having the archive.

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Ray Stell 2017-03-14 17:16:06 Re: WAL archive space planning?
Previous Message Steven Chang 2017-03-14 05:24:21 Re: WAL archive space planning?