From: | Amit Kapila <amit(dot)kapila(at)huawei(dot)com> |
---|---|
To: | "'Heikki Linnakangas'" <hlinnakangas(at)vmware(dot)com> |
Cc: | "'PostgreSQL-development'" <pgsql-hackers(at)postgreSQL(dot)org> |
Subject: | Re: Switching timeline over streaming replication |
Date: | 2012-11-16 14:01:23 |
Message-ID: | 00bd01cdc402$dd1c4ad0$9754e070$@kapila@huawei.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thursday, November 15, 2012 6:05 PM Heikki Linnakangas wrote:
> On 15.11.2012 12:44, Heikki Linnakangas wrote:
> > Here's an updated version of this patch, rebased with master,
> > including the recent replication timeout changes, and some other
> cleanup.
> >
> > On 12.10.2012 09:34, Amit Kapila wrote:
> >> The test is finished from myside.
> >>
> >> one more issue:
> > > ...
> >> ./pg_basebackup -P -D ../../data_sub -X fetch -p 2303
> >> pg_basebackup: COPY stream ended before last file was finished
> >
> > Fixed this.
> >
> > However, the test scenario you point to here:
> > http://archives.postgresql.org/message-id/00a801cda6f3$4aba27b0$e02e77
> > 10$(at)kapila@huawei.com still seems to be broken, although I get a
> > different error message now.
> > I'll dig into this..
>
> Ok, here's an updated patch again, with that bug fixed.
First, I started with test of this Patch.
Basic stuff:
------------
- Patch applies OK
- Compiles cleanly with no warnings
- Regression tests pass except the "standbycheck".
From a glance view of the "standbycheck" regression failures are because of
sql scripts and expected outputs are little old.
The following problems are observed while testing of the patch.
Defect-1:
1. start primary A
2. start standby B following A
3. start cascade standby C following B.
4. Promote standby B.
5. After successful time line switch in cascade standby C, stop C.
6. Restart C, startup is failing with the following error.
LOG: database system was shut down in recovery at 2012-11-16
16:26:29 IST
FATAL: requested timeline 2 does not contain minimum recovery point
0/30143A0 on timeline 1
LOG: startup process (PID 415) exited with exit code 1
LOG: aborting startup due to startup process failure
The above defect is already discussed in the following link.
http://archives.postgresql.org/message-id/00a801cda6f3$4aba27b0$e02e7710$@ka
pila(at)huawei(dot)com
Defect-2:
1. start primary A
2. start standby B following A
3. start cascade standby C following B with 'recovery_target_timeline'
option in
recovery.conf is disabled.
4. Promote standby B.
5. Cascade Standby C is not able to follow the new master B because of
timeline difference.
6. Try to stop the cascade standby C (which is failing and the
server is not stopping,
observations are as WAL Receiver process is still running and
clients are not allowing to connect).
The defect-2 is happened only once in my test environment, I will try to
reproduce it.
With Regards,
Amit Kapila.
From | Date | Subject | |
---|---|---|---|
Next Message | Merlin Moncure | 2012-11-16 14:03:04 | Re: WIP patch for hint bit i/o mitigation |
Previous Message | Markus Wanner | 2012-11-16 13:46:39 | Re: logical changeset generation v3 - comparison to Postgres-R change set format |