From: | Steven Schlansker <steven(at)likeness(dot)com> |
---|---|
To: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
Cc: | "pgsql-general(at)postgresql(dot)org postgresql" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Trimming transaction logs after extended WAL archive failures |
Date: | 2014-03-26 16:44:05 |
Message-ID: | D0117159-87B3-4CF0-864E-05DD52570B45@likeness.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Mar 26, 2014, at 9:04 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> On Tue, Mar 25, 2014 at 6:33 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> On Tuesday, March 25, 2014, Steven Schlansker <steven(at)likeness(dot)com> wrote:
> Hi everyone,
>
> I have a Postgres 9.3.3 database machine. Due to some intelligent work on the part of someone who shall remain nameless, the WAL archive command included a ‘> /dev/null 2>&1’ which masked archive failures until the disk entirely filled with 400GB of pg_xlog entries.
>
> PostgreSQL itself should be logging failures to the server log, regardless of whether those failures log themselves.
>
>
> I have fixed the archive command and can see WAL segments being shipped off of the server, however the xlog remains at a stable size and is not shrinking. In fact, it’s still growing at a (much slower) rate.
>
> The leading edge of the log files should be archived as soon as they fill up, and recycled/deleted two checkpoints later. The trailing edge should be archived upon checkpoints and then recycled or deleted. I think there is a throttle on how many off the trailing edge are archived each checkpoint. So issues a bunch of "CHECKPOINT;" commands for a while and see if that clears it up.
Indeed, forcing a bunch of CHECKPOINTS started to get things moving again.
>
> Actually my description is rather garbled, mixing up what I saw when wal_keep_segments was lowered, not when recovering from a long lasting archive failure. Nevertheless, checkpoints are what provoke the removal of excessive WAL files. Are you logging checkpoints? What do they say? Also, what is in pg_xlog/archive_status ?
>
I do log checkpoints, but most of them recycle and don’t remove:
Mar 26 16:09:36 prd-db1a postgres[29161]: [221-1] db=,user= LOG: checkpoint complete: wrote 177293 buffers (4.2%); 0 transaction log file(s) added, 0 removed, 56 recycled; write=539.838 s, sync=0.049 s, total=539.909 s; sync files=342, longest=0.015 s, average=0.000 s
That said, after letting the db run / checkpoint / archive overnight, the xlog did indeed start to slowly shrink. The pace at which it is shrinking is somewhat unsatisfying, but at least we are making progress now!
I guess if I had just been patient I could have saved some mailing list traffic. But patience is hard when your production database system is running at 0% free disk :)
Thanks everyone for the help, if the log continues to shrink, I should be out of the woods now.
Best,
Steven
From | Date | Subject | |
---|---|---|---|
Next Message | Brian Crowell | 2014-03-26 16:54:55 | Re: PG choosing nested loop for set membership? |
Previous Message | Tom Lane | 2014-03-26 16:43:05 | Re: PG choosing nested loop for set membership? |