From: | Thom Brown <thom(at)linux(dot)com> |
---|---|
To: | Andres Freund <andres(at)2ndquadrant(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Primary not sending to synchronous standby |
Date: | 2015-02-23 15:44:43 |
Message-ID: | CAA-aLv6xLtQhhumMenF96jbxheA9JK6aC8aDERoPi428GgYriw@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 23 February 2015 at 15:38, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> Hi,
>
> On 2015-02-23 15:25:57 +0000, Thom Brown wrote:
> > I've noticed that if the primary is started and then a base backup is
> > immediately taken from it and started as as a synchronous standby, it
> > doesn't replicate and the primary hangs indefinitely when trying to run
> any
> > WAL-generating statements. It only recovers when either the primary is
> > restarted (which has to use a fast shutdown otherwise it also hangs
> > forever), or the standby is restarted.
> >
> > Here's a way of reproducing it:
> > ...
> > Note that if you run the commands one by one, there isn't a problem. If
> > you run it as a script, the standby doesn't connect to the primary.
> There
> > aren't any errors reported by either the standby or the primary. The
> > primary's wal sender process reports the following:
> >
> > wal sender process rep_user 127.0.0.1(45243) startup waiting for
> 0/3000158
> >
> > Anyone know why this would be happening? And if this could be a problem
> in
> > other scenarios?
>
> Given that normally a walsender doesn't wait for syncrep I guess this is
> the above backend just did authentication. If you gdb into the
> walsender, what's the backtrace?
>
#0 0x00007f66d1725940 in poll () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x0000000000617faa in WaitLatchOrSocket ()
#2 0x000000000064741b in SyncRepWaitForLSN ()
#3 0x00000000004bbf8f in CommitTransaction ()
#4 0x00000000004be135 in CommitTransactionCommand ()
#5 0x0000000000757679 in InitPostgres ()
#6 0x0000000000675032 in PostgresMain ()
#7 0x00000000004617ef in ServerLoop ()
#8 0x0000000000627c9c in PostmasterMain ()
#9 0x000000000046223d in main ()
--
Thom
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2015-02-23 15:47:39 | Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations |
Previous Message | Andres Freund | 2015-02-23 15:42:02 | Re: Primary not sending to synchronous standby |