From: | Kouber Saparev <kouber(at)saparev(dot)com> |
---|---|
To: | Steve Crawford <scrawford(at)pinpointresearch(dot)com> |
Cc: | "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org> |
Subject: | Re: Big UPDATE breaking replication |
Date: | 2013-06-06 12:19:51 |
Message-ID: | 51B07E67.9060707@saparev.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-admin |
On 06/04/2013 06:47 PM, Steve Crawford wrote:
> On 06/04/2013 04:53 AM, Kouber Saparev wrote:
>> Hello,
>>
>> We are using the 9.1 built-in streaming replication.
>>
>> Recently our slave nodes fell behind because of an UPDATE statement. It
>> took about 3 minutes to execute, but it affected half a million records,
>> hence the replication broke with the "requested WAL segment ... has
>> already been removed" series of error messages.
>>
>> The WAL settings we have are:
>>
>> max_wal_senders = 6
>> wal_keep_segments = 60
>> max_standby_archive_delay = 300s
>>
>>
>> I guess increasing the wal_keep_segments value would prevent it from
>> happening in the future, but increase it with how much? What value would
>> be high enough?
>
> You can use WAL shipping to protect against this or set
> wal_keep_segments higher. I set my main server to a tad over 1,000 and
> know I can do a full restore on the master without coming close to
> breaking replication. My xlog dir is 17G. A bit of a waste, perhaps,
> but I've noted no ill effects and it's still only a sliver of the
> total drive capacity.
Well, the streaming replication involves the WAL shipping already. Do
you mean to archive the WALs somewhere and then scp them with a cron
job? I doubt it would be fast enough neither.
>
>
>>
>> Also we noticed some strange error message appearing shortly before and
>> after this same statement: "LOG: out of file descriptors: Too many open
>> files; release and retry".
>>
>> Could it be related somehow and what does it mean exactly?
>
> What is your setup (Linux? Mac? Windows? VM in the cloud? How many
> simultaneous connections?...) You will find a lot of info in old
> messages on the subject but, annotating the docs: If the kernel is
> enforcing a safe per-process limit, you don't need to worry about this
> setting. But on some platforms (notably, most BSD systems - looking at
> you Mac), the kernel will allow individual processes to open many more
> files than the system can actually support if many processes all try
> to open that many files....
>
> An old email from Tom Lane notes that you need to make sure your
> kernel can support approximately
> max_connections * (max_files_per_process + max_connections) open file
> handles plus any requirements imposed by other processes on the system
> and comments that Mac treats each semaphore as an open file.
>
> My interpretation is that if your kernel enforces things properly you
> don't need to worry. If it doesn't, reduce your max_connections and/or
> max_files_per_process as needed.
We are running virtual machines with Ubuntu Server 12.04.1 LTS. The
error looks very confusing given:
root(at)DB1:~# cat /proc/sys/fs/file-max
1621645
root(at)DB1:~# lsof | wc -l
7013
It is not present normally in our logs, that's why I am suspecting that
it has some correlation with the WAL issue above. Could it be that the
master was not able to open properly the wal sender for this transaction?
--
Kouber Saparev
From | Date | Subject | |
---|---|---|---|
Next Message | dx k9 | 2013-06-06 14:05:37 | rpm upgrade on CentOS 6.3 |
Previous Message | apagador | 2013-06-06 08:37:28 | Hide users not created by a user |