Big UPDATE breaking replication

From: Kouber Saparev <kouber(at)saparev(dot)com>
To: "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>
Subject: Big UPDATE breaking replication
Date: 2013-06-04 11:53:51
Message-ID: 51ADD54F.3030702@saparev.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Hello,

We are using the 9.1 built-in streaming replication.

Recently our slave nodes fell behind because of an UPDATE statement. It
took about 3 minutes to execute, but it affected half a million records,
hence the replication broke with the "requested WAL segment ... has
already been removed" series of error messages.

The WAL settings we have are:

max_wal_senders = 6
wal_keep_segments = 60
max_standby_archive_delay = 300s

I guess increasing the wal_keep_segments value would prevent it from
happening in the future, but increase it with how much? What value would
be high enough?

Also we noticed some strange error message appearing shortly before and
after this same statement: "LOG: out of file descriptors: Too many open
files; release and retry".

Could it be related somehow and what does it mean exactly?

Here's an excerpt from the master DB log:

May 30 12:23:09 DB1 postgres[28201]: [13-1] user=www,db=xxx LOG: out of
file descriptors: Too many open files; release and retry
May 30 12:23:09 DB1 postgres[28201]: [13-2] user=www,db=xxx CONTEXT:
writing block 0 of relation base/2819385/2820788
May 30 12:23:09 DB1 postgres[28201]: [13-3] user=www,db=xxx STATEMENT:
UPDATE
May 30 12:23:09 DB1 postgres[28201]: [13-4] ^I message
May 30 12:23:09 DB1 postgres[28201]: [13-5] ^I SET
May 30 12:23:09 DB1 postgres[28201]: [13-6] ^I
sender_has_deleted=TRUE,
May 30 12:23:09 DB1 postgres[28201]: [13-7] ^I
receiver_has_deleted=TRUE
May 30 12:23:09 DB1 postgres[28201]: [13-8] ^I WHERE
from_profile_sid=870

...

May 30 12:39:47 DB1 postgres[9053]: [2-1] user=postgres,db=[unknown]
FATAL: requested WAL segment 00000001000002DE000000BD has already been
removed

Regards,
--
Kouber Saparev

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Albe Laurenz 2013-06-04 12:38:35 Re: Big UPDATE breaking replication
Previous Message prakhar jauhari 2013-06-04 06:13:16 Re: Steps to switch from Master to standby mode :