Re: Streaming Replication Randomly Locking Up

From: Lonni J Friedman <netllama(at)gmail(dot)com>
To: Andrew Berman <rexxe98(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Streaming Replication Randomly Locking Up
Date: 2013-08-15 18:51:27
Message-ID: CAP=oouF4xD+h=HTup9k5_1YdZsKwdj0Q2S-ssWAJAgate47Ocg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Are you certain that there are no relevant errors in the database logs
(on both master & slave)? Also, are you sure that you didn't
misconfigure logging such that errors wouldn't appear?

On Thu, Aug 15, 2013 at 11:45 AM, Andrew Berman <rexxe98(at)gmail(dot)com> wrote:
> Hi Lonni,
>
> Yes, I am using PG 9.1.9.
> Yes, 1 slave syncing from the master
> CentOS 6.4
> I don't see any network or hardware issues (e.g. NIC) but will look more
> into this. They are communicating on a private network and switch.
>
> I forgot to mention that after I restart the slave, everything syncs right
> back up and all if working again so if it is a network issue, the
> replication is just stopping after some hiccup instead of retrying and
> resuming when things are back up.
>
> Thanks!
>
>
>
> On Thu, Aug 15, 2013 at 11:32 AM, Lonni J Friedman <netllama(at)gmail(dot)com>
> wrote:
>>
>> I've never seen this happen. Looks like you might be using 9.1? Are
>> you up to date on all the 9.1.x releases?
>>
>> Do you have just 1 slave syncing from the master?
>> Which OS are you using?
>> Did you verify that there aren't any network problems between the
>> slave & master?
>> Or hardware problems (like the NIC dying, or dropping packets)?
>>
>>
>> On Thu, Aug 15, 2013 at 11:07 AM, Andrew Berman <rexxe98(at)gmail(dot)com> wrote:
>> > Hello,
>> >
>> > I'm having an issue where streaming replication just randomly stops
>> > working.
>> > I haven't been able to find anything in the logs which point to an
>> > issue,
>> > but the Postgres process shows a "waiting" status on the slave:
>> >
>> > postgres 5639 0.1 24.3 3428264 2970236 ? Ss Aug14 1:54
>> > postgres:
>> > startup process recovering 000000010000053D0000003F waiting
>> > postgres 5642 0.0 21.4 3428356 2613252 ? Ss Aug14 0:30
>> > postgres:
>> > writer process
>> > postgres 5659 0.0 0.0 177524 788 ? Ss Aug14 0:03
>> > postgres:
>> > stats collector process
>> > postgres 7159 1.2 0.1 3451360 18352 ? Ss Aug14 17:31
>> > postgres:
>> > wal receiver process streaming 549/216B3730
>> >
>> > The replication works great for days, but randomly seems to lock up and
>> > replication halts. I verified that the two databases were out of sync
>> > with
>> > a query on both of them. Has anyone experienced this issue before?
>> >
>> > Here are some relevant config settings:
>> >
>> > Master:
>> >
>> > wal_level = hot_standby
>> > checkpoint_segments = 32
>> > checkpoint_completion_target = 0.9
>> > archive_mode = on
>> > archive_command = 'rsync -a %p foo(at)foo:/var/lib/pgsql/9.1/wals/%f
>> > </dev/null'
>> > max_wal_senders = 2
>> > wal_keep_segments = 32
>> >
>> > Slave:
>> >
>> > wal_level = hot_standby
>> > checkpoint_segments = 32
>> > #checkpoint_completion_target = 0.5
>> > hot_standby = on
>> > max_standby_archive_delay = -1
>> > max_standby_streaming_delay = -1
>> > #wal_receiver_status_interval = 10s
>> > #hot_standby_feedback = off
>> >
>> > Thank you for any help you can provide!
>> >
>> > Andrew
>> >

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Andrew Berman 2013-08-15 19:22:49 Re: Streaming Replication Randomly Locking Up
Previous Message Etienne Dube 2013-08-15 18:46:54 Strange result with "SELECT ... ORDER BY random() LIMIT 1" and JOINs