From: | Patrick Krecker <patrick(at)judicata(dot)com> |
---|---|
To: | Andres Freund <andres(at)2ndquadrant(dot)com> |
Cc: | Chris Hundt <chris(at)judicata(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, pgsql-general(at)postgresql(dot)org |
Subject: | Re: WAL receive process dies |
Date: | 2014-08-29 23:26:13 |
Message-ID: | CAK2mJFMq9XUyrgF0MorERjm2F4gp92SMpxAStm5kkje+t2TEtw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Fri, Aug 29, 2014 at 3:46 PM, Andres Freund <andres(at)2ndquadrant(dot)com>
wrote:
> [FWIW: proper quoting makes answering easier and thus more likely]
>
> On 2014-08-29 15:37:51 -0700, Patrick Krecker wrote:
> > I ran the following on the local endpoint of spiped:
> >
> > while [ true ]; do psql -h localhost -p 5445 judicata -U marbury -c
> "select
> > modtime, pg_last_xlog_receive_location(), pg_last_xlog_replay_location()
> > from replication_time"; done;
> >
> > And the same command on production and I was able to verify that the
> xlogs
> > for a given point in time were the same (modtime is updated every second
> by
> > an upstart job):
> >
> > spiped from office -> production:
> > modtime | pg_last_xlog_receive_location |
> > pg_last_xlog_replay_location
> >
> ----------------------------+-------------------------------+------------------------------
> > 2014-08-29 15:23:25.563766 | 177/2E80C9F8 |
> 177/2E80C9F8
> >
> > Ran directly on production replica:
> > modtime | pg_last_xlog_receive_location |
> > pg_last_xlog_replay_location
> >
> ----------------------------+-------------------------------+------------------------------
> > 2014-08-29 15:23:25.563766 | 177/2E80C9F8 |
> 177/2E80C9F8
> >
> > To me, this is sufficient proof that spiped is indeed talking to the
> > machine I think it's talking to (also lsof reports the correct hostname).
> >
> > I created another basebackup from the currently stuck postgres intance on
> > another machine and I also get this error:
> >
> > 2014-08-29 15:27:30 PDT FATAL: could not receive data from WAL stream:
> > ERROR: requested starting point 177/2D000000 is ahead of the WAL flush
> > position of this server 174/B76D16A8
>
> Uh. this indicates that the machine you're talking to is *not* one of
> the above as it has a flush position of '174/B76D16A8' - not something
> that's really possible when the node actually is at '177/2E80C9F8'.
>
> Could you run, on the standby that's having problems, the following
> command:
> psql 'host=127.0.0.1 port=5445 user=XXX password=XXX' -c 'IDENTIFY_SYSTEM;'
>
> Greetings,
>
> Andres Freund
>
> --
> Andres Freund http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>
RE: quoting, I wonder if Gmail is messing it up somehow? Or am I doing
something else wrong? Sorry :(
First, I apologize for the misleading information, but when I made another
basebackup and tried to use it, I configured the machine to cascade from
the stuck replica, *not* from the spiped endpoint. When I properly
connected it to the spiped endpoint it synced up fine, giving this log line:
2014-08-29 16:16:21 PDT LOG: started streaming WAL from primary at
177/4F000000 on timeline 1
The command as you gave reported a syntax error as is, but I googled a
little bit and run this one:
psql 'replication=1 dbname=XXX host=127.0.0.1 port=5445 user=XXX
password=XXX' -c 'IDENTIFY_SYSTEM;'
And it gave me this output:
systemid | timeline | xlogpos
---------------------+----------+--------------
5964163898407843711 | 1 | 177/53091990
From | Date | Subject | |
---|---|---|---|
Next Message | Patrick Dung | 2014-08-30 04:00:56 | Re: Is there a function to save schema history internally? |
Previous Message | Andres Freund | 2014-08-29 22:46:38 | Re: WAL receive process dies |