Re: WAL segement issues on both master and slave server

From: Payal Singh <payal(at)omniti(dot)com>
To: Chris Kim <chrisk(at)propaas(dot)com>
Cc: "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>
Subject: Re: WAL segement issues on both master and slave server
Date: 2017-10-19 17:49:54
Message-ID: CANUg7LDvYEq3_nuHUZqZsxmUV6w0RJFMEAPuFjLxxhYn1L3t4Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

>
> On Thu, Oct 19, 2017 at 1:25 PM, Chris Kim <chrisk(at)propaas(dot)com> wrote:
>
>> Hi there,
>>
>> I am running into an issue with the number of files that reside on the
>> pg_xlog directory of my compliance database server (This one is the master
>> server in our master-slave setup). Sometime earlier this year, I modified
>> the location of the PITR directory and that caused an issue with WAL
>> segments not being sent to the correct location and crashing the DB. I went
>> ahead and fixed that up so that it points to the correct location but since
>> then the number of files on the pg_xlog directory went up from around 898
>> to 1025. I didn't have a chance to look in to this issue until now so my
>> question is do you know if there is an easy way to clean up some of these
>> files in the pg_xlog directory safely? I believe that there might be some
>> orphaned files there and would like to clean those up.
>>
>
How is the replication being done? Is the replica in sync with master?
Check for lag on replica and replication byte lag on master, and if they
are in sync, an `ls -l | less` in wal directory should show you which older
files are being kept. Do check in both master and replica postgres and
archive logs for any ERROR or FATAL messages before you remove any files
though. As an extra precaution, you can just move the older files to
another location where postgres can't access it, and if something breaks,
you can move them back. If all looks good after moving, you can delete the
files you moved.

Would highly recommend having a monitor in place to track # of WALs in the
WAL directory and alerting if too high.

>
>> Also, on the Standby, the pg_xlog directory appears like it is growing on
>> a daily basis. The WAL files are being cleaned up but I don't believe at a
>> fast enough rate. This directory is approximately over 650GB in size and I
>> would like to revisit if any of the parameters will need to be changed in
>> the postgresql.conf file since it's almost 5 years since I last touched
>> this.
>>
>> Let me know if you need more details to clarify.
>>
>> Thanks.
>>
>>
Again, this might be a sign that replication is lagging. If your cleanup
command is correct and related logs have nothing suspicious, checking the
replication lag would be a good first step to determine the cause.

Thanks,
Payal Singh,
Database Administrator,
OmniTI Computer Consulting Inc.
Phone: 240.646.0770 x 253

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Cory Nemelka 2017-10-19 21:03:31 Processing very large TEXT columns (300MB+) using C/libpq
Previous Message Chris Kim 2017-10-19 17:25:30 WAL segement issues on both master and slave server