From: | Amit Kapila <amit(dot)kapila(at)huawei(dot)com> |
---|---|
To: | "'Heikki Linnakangas'" <hlinnakangas(at)vmware(dot)com> |
Cc: | "'PostgreSQL-development'" <pgsql-hackers(at)postgreSQL(dot)org>, "'Thom Brown'" <thom(at)linux(dot)com> |
Subject: | Re: Switching timeline over streaming replication |
Date: | 2012-12-06 13:39:59 |
Message-ID: | 00e101cdd3b7$2ff195d0$8fd4c170$@kapila@huawei.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thursday, December 06, 2012 12:53 AM Heikki Linnakangas wrote:
> On 05.12.2012 14:32, Amit Kapila wrote:
> > On Tuesday, December 04, 2012 10:01 PM Heikki Linnakangas wrote:
> >> After some diversions to fix bugs and refactor existing code, I've
> >> committed a couple of small parts of this patch, which just add some
> >> sanity checks to notice incorrect PITR scenarios. Here's a new
> >> version of the main patch based on current HEAD.
> >
> > After testing with the new patch, the following problems are observed.
> >
> > Defect - 1:
> >
> > 1. start primary A
> > 2. start standby B following A
> > 3. start cascade standby C following B.
> > 4. start another standby D following C.
> > 5. Promote standby B.
> > 6. After successful time line switch in cascade standby C& D,
> stop D.
> > 7. Restart D, Startup is successful and connecting to standby C.
> > 8. Stop C.
> > 9. Restart C, startup is failing.
>
> Ok, the error I get in that scenario is:
>
> C 2012-12-05 19:55:43.840 EET 9283 FATAL: requested timeline 2 does not
> contain minimum recovery point 0/3023F08 on timeline 1 C 2012-12-05
> 19:55:43.841 EET 9282 LOG: startup process (PID 9283) exited with exit
> code 1 C 2012-12-05 19:55:43.841 EET 9282 LOG: aborting startup due to
> startup process failure
>
>
> That mismatch causes the error. I'd like to fix this by always treating
> the checkpoint record to be part of the new timeline. That feels more
> correct. The most straightforward way to implement that would be to peek
> at the xlog record before updating replayEndRecPtr and replayEndTLI. If
> it's a checkpoint record that changes TLI, set replayEndTLI to the new
> timeline before calling the redo-function. But it's a bit of a
> modularity violation to peek into the record like that.
>
> Or we could just revert the sanity check at beginning of recovery that
> throws the "requested timeline 2 does not contain minimum recovery point
> 0/3023F08 on timeline 1" error. The error I added to redo of checkpoint
> record that says "unexpected timeline ID %u in checkpoint record, before
> reaching minimum recovery point %X/%X on timeline %u" checks basically
> the same thing, but at a later stage. However, the way
> minRecoveryPointTLI is updated still seems wrong to me, so I'd like to
> fix that.
>
> I'm thinking of something like the attached (with some more comments
> before committing). Thoughts?
This has fixed the problem reported.
However, I am not able to think will there be any problem if we remove check
"requested timeline 2 does not contain minimum recovery point
> 0/3023F08 on timeline 1" at beginning of recovery and just update
replayEndTLI with ThisTimeLineID?
With Regards,
Amit Kapila.
From | Date | Subject | |
---|---|---|---|
Next Message | Simon Riggs | 2012-12-06 14:07:32 | Re: Commits 8de72b and 5457a1 (COPY FREEZE) |
Previous Message | Andres Freund | 2012-12-06 13:12:56 | Re: Commits 8de72b and 5457a1 (COPY FREEZE) |