From: | Amit Kapila <amit(dot)kapila(at)huawei(dot)com> |
---|---|
To: | "'Kyotaro HORIGUCHI'" <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> |
Cc: | <masao(dot)fujii(at)gmail(dot)com>, <hlinnakangas(at)vmware(dot)com>, <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Fast promotion failure |
Date: | 2013-05-13 03:07:27 |
Message-ID: | 006801ce4f87$005569e0$01003da0$@kapila@huawei.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Monday, May 13, 2013 5:54 AM Kyotaro HORIGUCHI wrote:
> 2013/05/10 20:01 "Amit Kapila" <amit(dot)kapila(at)huawei(dot)com>:
> > > > C 2013-05-10 15:32:32.170 JST 9242 FATAL: could not receive data
> > > from WAL stream:
> >
> > Is there any chance, that there is any network glitch caused this one
> time
> > error.
>
> Unix domam sockets are hardly likely to have such troubles. This
> test ran within single host.
>
> > > I'm get confused, the patch seems to me ensureing the "first
> > > checkpoint after fast promotion is performed" to use the
> > > "correct, new, ThisTimeLineID".
> >
> > What is your confusion?
>
> Heikki said in the fist message in this thread that he suspected
> the cause of the failure he had seen to be wrong TLI on whitch
> checkpointer runs. Nevertheless, the patch you suggested for me
> looks fixing it. Moreover (one of?) the failure from the same
> cause looks fixed with the patch.
There were 2 problems:
1. There was some issue in walsender logic due to which after promotion in
some cases it hits assertion or error
2. During fast promotion, checkpoint gets created with wrong TLI
He has provided 2 different patches
fix-standby-promotion-assert-fail-2.patch and
fast-promotion-quick-fix.patch.
Among 2, he has already committed fix-standby-promotion-assert-fail-2.patch
(http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=2ffa66f49
75c99e52984f7ee81b47d137b5b4751)
> Is the point of this discussion that the patch may leave out some
> glich about timing of timeline-related changing and Heikki saw an
> egress of that?
AFAIU, the committed patch has some gap in overall scenario which is the
fast promotion issue.
With Regards,
Amit Kapila.
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2013-05-13 04:03:47 | Re: Add more regression tests for dbcommands |
Previous Message | Evan D. Hoffman | 2013-05-13 02:43:41 | Re: Re: [GENERAL] pg_upgrade fails, "mismatch of relation OID" - 9.1.9 to 9.2.4 |