Re: Standby is not removing restored WAL segments

From: Guillaume Lelarge <guillaume(at)lelarge(dot)info>
To: Alexey Klyukin <alexk(at)hintbits(dot)com>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: Standby is not removing restored WAL segments
Date: 2014-09-15 15:33:32
Message-ID: CAECtzeU76EXpJEivcw-ioLNUkj6xWXVhDbvOZMkrF5jzv2oc9Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Hi,

2014-09-05 9:33 GMT+02:00 Alexey Klyukin <alexk(at)hintbits(dot)com>:

> Greetings,
>
> We've got a 9.3.5 DB running in a standby mode for a fairly large DB
> (500GB) with a busy WAL traffic (couple of GBs per hour) and it
> occasionally 'forgets' to remove the segments it restored.
>
> The checkpoint_segments is set to 128, and usually we observe around
> 270 segments accumulated, but at the time it happens our check
> triggers at around 2K segments. The manual checkpoint command takes
> ages to complete there, the fast shutdown is very slow (around 10
> minutes, usually less than 1 minute) and the WAL receiver process is
> also unable to run for some reason.
>
> The only way to make this host delete WAL files is to restart . The
> particularly notable restart point right after the shutdown shows
> quite a number of removed files and buffers written (the shared
> buffers is set to 8GB on this system):
>
> 2014-09-04 14:39:33.376 CEST,,,22354,,537a4553.5752,88217,,2014-05-19
> 19:54:27 CEST,,0,LOG,00000,"restartpoint complete: wrote 332473
> buffers (31.7%); 0 transaction log file(s) added, 1237 removed, 6
> recycled; write=9.745 s, sync=680.314 s, total=694.447 s; sync
> files=499
> , longest=37.774 s, average=1.363 s",,,,,,,,,""
>
> If we leave the host running, this restartpoint never happens.
>
> The only difference I can come up with from the other databases that
> do not show this behavior is that the host is running with
> max_standby_streaming_delay and max_standby_archive_delay set to -1,
> but at the time we observed the problem no queries were running on it
> at all.
>
> The problem occurs rarely, but steadily, around once every 3 months.
> During this time the PostgreSQL has been upgraded from 9.0 to 9.3,
> which did not solve the issue.
>
> Any clues on how can we debug and diagnose the problem further to come
> up with a proper bug report, if it is a bug, or are we missing
> something in the configuration that causes this?
>
>
I have no direct answer for you, but we seem to have the same issue for two
of our customers. We are on 9.2.8 on one of them. Do you know if you have
the .ready related files in the archive_status directory? are they old WAL
files? can you tell us their names?

We're still investigating the issue. Not that it's a real issue, but it's
still weird. And we'd like to understand what's happening.

--
Guillaume.
http://blog.guillaume.lelarge.info
http://www.dalibo.com

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message gabrielle 2014-09-16 00:47:27 log message from autovac doesn't include db name
Previous Message Rajesh Madiwale 2014-09-15 04:47:53 Re: Out of memory running 560 MB query