Re: Streaming Replication Randomly Locking Up

From: John DeSoi <desoi(at)pgedit(dot)com>
To: Andrew Berman <rexxe98(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Streaming Replication Randomly Locking Up
Date: 2013-08-16 15:39:00
Message-ID: C9B41E27-B487-411F-A1BD-9FDC9340E5C3@pgedit.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


On Aug 15, 2013, at 1:07 PM, Andrew Berman <rexxe98(at)gmail(dot)com> wrote:

> I'm having an issue where streaming replication just randomly stops working. I haven't been able to find anything in the logs which point to an issue, but the Postgres process shows a "waiting" status on the slave:
>
> postgres 5639 0.1 24.3 3428264 2970236 ? Ss Aug14 1:54 postgres: startup process recovering 000000010000053D0000003F waiting
> postgres 5642 0.0 21.4 3428356 2613252 ? Ss Aug14 0:30 postgres: writer process
> postgres 5659 0.0 0.0 177524 788 ? Ss Aug14 0:03 postgres: stats collector process
> postgres 7159 1.2 0.1 3451360 18352 ? Ss Aug14 17:31 postgres: wal receiver process streaming 549/216B3730
>
> The replication works great for days, but randomly seems to lock up and replication halts. I verified that the two databases were out of sync with a query on both of them. Has anyone experienced this issue before?
>
> Here are some relevant config settings:
>
> Master:
>
> wal_level = hot_standby
> checkpoint_segments = 32
> checkpoint_completion_target = 0.9
> archive_mode = on
> archive_command = 'rsync -a %p foo(at)foo:/var/lib/pgsql/9.1/wals/%f </dev/null'
> max_wal_senders = 2
> wal_keep_segments = 32

I recently posted about the same thing -- replication just stops after working OK for days or weeks, no errors in the logs on master or slave.

It appears I solved it by adding --timeout=30 to my rsync command. My guess was some kind of network hang and then rsync would just wait forever and never return.

John DeSoi, Ph.D.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Andrew Berman 2013-08-16 16:26:05 Re: Streaming Replication Randomly Locking Up
Previous Message Rob Sargent 2013-08-16 15:35:46 Re: devide and summarize sql result