Re: pg_xlog no longer rotating out

From: Mark Steben <mark(dot)steben(at)drivedominion(dot)com>
To: Johannes Truschnigg <johannes(at)truschnigg(dot)info>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: pgsql-admin <pgsql-admin(at)postgresql(dot)org>
Subject: Re: pg_xlog no longer rotating out
Date: 2019-07-02 21:31:38
Message-ID: CADyzmyzd6ZR1POtwnmvDUt0Xc9ksA+eooHUjukoWW-V_GpASPA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Thank you both your prompt response - much appreciated.
Alvaro, you were correct. There was a replication slot that was inactive.
We had to kill
replication on a slave server over the weekend and I never deleted the
replication slot.
Once I deleted it, logs also cleared out.
Johannes, I will also strongly consider your insight/recommendations.

On Tue, Jul 2, 2019 at 5:09 PM Johannes Truschnigg <johannes(at)truschnigg(dot)info>
wrote:

> Hi Mark,
>
> On Tue, Jul 02, 2019 at 04:30:25PM -0400, Mark Steben wrote:
> > Good afternoon,
> > We run postgres 9.4 with hot standby streaming replication. We have been
> > running abnormally high updates and the pg_xlogs have queued to about 80
> > percent of disk capacity. This often happens and the network is not fast
> > enough to handle the transfer of logs to the replicated server.
> > What I have done in the past, and done again today is to change the
> > archive_command from the script that scp's the logs to '/bin/true'. This
> > stops the logship and allows the normal pg_xlog rotation to catch up.
>
> Postgres cares only about the exit code of your "archive_command", so even
> when you're just trying to get rid of WAL segments (without actually moving
> them into an archive), it's best to set it to something that gives some
> kind
> of indication that something's actually happening with the WAL. I'd
> recommend
> replacing your `scp` invocation with a roughly equivalent command line,
> prefixed with `echo`, so you can check whether or not your archive_command
> is
> executed at all, and which WAL segment it's targeting.
>
> What is your *normal* archive_command setting, with `scp` involved? Do you
> have any "special" ssh_config flags in effect for its target host?
>
>
> > This has not happened this time. Logs have been stored on pg_xlog since
> > June 30 and the number of logs keep climbing. I am now manually moving
> > some of the logs that were processed on June 30 off to another disk to
> > alleviate the space. PLEASE HELP if you can.
>
> Manually touching pg_xlog (or pg_wal on newer releases) is not a good
> idea, as
> I'm sure you're aware. If I were in your position, my next steps would be:
>
> 1.) Create a number (think maybe five or six) files that are a few hundred
> MBytes in size on the filesystem hosting your database. The idea is that,
> IF
> your filesystems fills up, you have a number of shots to restart the
> database
> without breaking a sweat after removing one of these "insurance files".
>
> 2.) Figure out what's the root cause for your WAL not being
> archived/recognized as successfully processed, and what WAL you postmaster
> is
> currently trying to deal with, as detailed above, and fix it - or return to
> the list with your findings if you can't see what's wrong.
>
>
> > It looks like archiving is still occurring despite '/bin/true' being set.
> > Can I safely kill -term the archive thread?
>
> How exactly do you figure that there's still actual archiving going on?
> (Are
> you monitoring the remote host's "auth" log or anything like that, which
> would
> probably trigger upon scp-incurred logins?)
>
> Which process (full output from `ps -fp <that-particular-PID>`) are you
> talking about killing in particular?
>
>
>
> With all that said, you shouldn't really be using use `scp` as an
> archive_command in the first place. Maybe take a look at
> https://johannes.truschnigg.info/code/pg_archive_wal_segment-2.0.0/ (and
> its
> README file) for properties to consider when choosing an archive_command
> that's fit for the job. (In my view, it's always better to perform
> archiving
> to a *host-local* filesystem, and have its consumers pick it up from there,
> instead of shipping directly onto a consumer's filesystem over the
> network.)
>
> --
> with best regards:
> - Johannes Truschnigg ( johannes(at)truschnigg(dot)info )
>
> www: https://johannes.truschnigg.info/
> phone: +43 650 2 133337
> xmpp: johannes(at)truschnigg(dot)info
>
> Please do not bother me with HTML-email or attachments. Thank you.
>

--
*Mark Steben*
Database Administrator
@utoRevenue <http://www.autorevenue.com/> | Autobase
<http://www.autobase.net/>
CRM division of Dominion Dealer Solutions
95D Ashley Ave.
West Springfield, MA 01089
t: 413.327-3045
f: 413.383-9567

www.fb.com/DominionDealerSolutions
www.twitter.com/DominionDealer
www.drivedominion.com <http://www.autorevenue.com/>

<http://autobasedigital.net/marketing/DD12_sig.jpg>

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message William Sescu (Suva) 2019-07-03 14:09:36 REINDEX VERBOSE DATABASE not working
Previous Message Johannes Truschnigg 2019-07-02 21:09:24 Re: pg_xlog no longer rotating out