Re: Streaming Replication Randomly Locking Up

From: Andrew Berman <rexxe98(at)gmail(dot)com>
To: Lonni J Friedman <netllama(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Streaming Replication Randomly Locking Up
Date: 2013-08-15 19:38:02
Message-ID: CAEVpa74hs1sK1-uM9GGvu5aqgmK+y+aPTrHz8ZvzV_krU=OT1g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Yep, that's the first thing I'm going to do.

On Thu, Aug 15, 2013 at 12:34 PM, Lonni J Friedman <netllama(at)gmail(dot)com>wrote:

> I'd suggest enhancing your logging to include time/datestamps for
> every entry, and also the client hostname. That will help to rule
> in/out those 'unexpected EOF' errors.
>
> On Thu, Aug 15, 2013 at 12:22 PM, Andrew Berman <rexxe98(at)gmail(dot)com> wrote:
> > The only thing I see that is a possibility for the issue is in the slave
> > log:
> >
> > LOG: unexpected EOF on client connection
> > LOG: could not receive data from client: Connection reset by peer
> >
> > I don't know if that's related or not as it could just be somebody
> running a
> > query. The log file does seem to be riddled with these but the
> replication
> > failures don't happen constantly.
> >
> > As far as I know I'm not swallowing any errors. The logging is all set
> as
> > the default:
> >
> > log_destination = 'stderr'
> > logging_collector = on
> > #client_min_messages = notice
> > #log_min_messages = warning
> > #log_min_error_statement = error
> > #log_min_duration_statement = -1
> > #log_checkpoints = off
> > #log_connections = off
> > #log_disconnections = off
> > #log_error_verbosity = default
> >
> > I'm going to have a look at the NICs to make sure there's no issue there.
> >
> > Thanks again for your help!
> >
> >
> > On Thu, Aug 15, 2013 at 11:51 AM, Lonni J Friedman <netllama(at)gmail(dot)com>
> > wrote:
> >>
> >> Are you certain that there are no relevant errors in the database logs
> >> (on both master & slave)? Also, are you sure that you didn't
> >> misconfigure logging such that errors wouldn't appear?
> >>
> >> On Thu, Aug 15, 2013 at 11:45 AM, Andrew Berman <rexxe98(at)gmail(dot)com>
> wrote:
> >> > Hi Lonni,
> >> >
> >> > Yes, I am using PG 9.1.9.
> >> > Yes, 1 slave syncing from the master
> >> > CentOS 6.4
> >> > I don't see any network or hardware issues (e.g. NIC) but will look
> more
> >> > into this. They are communicating on a private network and switch.
> >> >
> >> > I forgot to mention that after I restart the slave, everything syncs
> >> > right
> >> > back up and all if working again so if it is a network issue, the
> >> > replication is just stopping after some hiccup instead of retrying and
> >> > resuming when things are back up.
> >> >
> >> > Thanks!
> >> >
> >> >
> >> >
> >> > On Thu, Aug 15, 2013 at 11:32 AM, Lonni J Friedman <
> netllama(at)gmail(dot)com>
> >> > wrote:
> >> >>
> >> >> I've never seen this happen. Looks like you might be using 9.1? Are
> >> >> you up to date on all the 9.1.x releases?
> >> >>
> >> >> Do you have just 1 slave syncing from the master?
> >> >> Which OS are you using?
> >> >> Did you verify that there aren't any network problems between the
> >> >> slave & master?
> >> >> Or hardware problems (like the NIC dying, or dropping packets)?
> >> >>
> >> >>
> >> >> On Thu, Aug 15, 2013 at 11:07 AM, Andrew Berman <rexxe98(at)gmail(dot)com>
> >> >> wrote:
> >> >> > Hello,
> >> >> >
> >> >> > I'm having an issue where streaming replication just randomly stops
> >> >> > working.
> >> >> > I haven't been able to find anything in the logs which point to an
> >> >> > issue,
> >> >> > but the Postgres process shows a "waiting" status on the slave:
> >> >> >
> >> >> > postgres 5639 0.1 24.3 3428264 2970236 ? Ss Aug14 1:54
> >> >> > postgres:
> >> >> > startup process recovering 000000010000053D0000003F waiting
> >> >> > postgres 5642 0.0 21.4 3428356 2613252 ? Ss Aug14 0:30
> >> >> > postgres:
> >> >> > writer process
> >> >> > postgres 5659 0.0 0.0 177524 788 ? Ss Aug14 0:03
> >> >> > postgres:
> >> >> > stats collector process
> >> >> > postgres 7159 1.2 0.1 3451360 18352 ? Ss Aug14 17:31
> >> >> > postgres:
> >> >> > wal receiver process streaming 549/216B3730
> >> >> >
> >> >> > The replication works great for days, but randomly seems to lock up
> >> >> > and
> >> >> > replication halts. I verified that the two databases were out of
> >> >> > sync
> >> >> > with
> >> >> > a query on both of them. Has anyone experienced this issue before?
> >> >> >
> >> >> > Here are some relevant config settings:
> >> >> >
> >> >> > Master:
> >> >> >
> >> >> > wal_level = hot_standby
> >> >> > checkpoint_segments = 32
> >> >> > checkpoint_completion_target = 0.9
> >> >> > archive_mode = on
> >> >> > archive_command = 'rsync -a %p foo(at)foo:/var/lib/pgsql/9.1/wals/%f
> >> >> > </dev/null'
> >> >> > max_wal_senders = 2
> >> >> > wal_keep_segments = 32
> >> >> >
> >> >> > Slave:
> >> >> >
> >> >> > wal_level = hot_standby
> >> >> > checkpoint_segments = 32
> >> >> > #checkpoint_completion_target = 0.5
> >> >> > hot_standby = on
> >> >> > max_standby_archive_delay = -1
> >> >> > max_standby_streaming_delay = -1
> >> >> > #wal_receiver_status_interval = 10s
> >> >> > #hot_standby_feedback = off
> >> >> >
> >> >> > Thank you for any help you can provide!
> >> >> >
> >> >> > Andrew
> >> >> >
> >
> >
>
>
>
> --
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> L. Friedman netllama(at)gmail(dot)com
> LlamaLand https://netllama.linux-sxs.org
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Robert James 2013-08-15 20:16:04 Escape string for LIKE op
Previous Message Lonni J Friedman 2013-08-15 19:34:16 Re: Streaming Replication Randomly Locking Up