From: | Andres Freund <andres(at)2ndquadrant(dot)com> |
---|---|
To: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> |
Cc: | Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Too strict check when starting from a basebackup taken off a standby |
Date: | 2014-12-18 08:30:01 |
Message-ID: | 20141218083001.GY5023@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2014-12-16 18:37:48 +0200, Heikki Linnakangas wrote:
> On 12/11/2014 04:21 PM, Marco Nenciarini wrote:
> >Il 11/12/14 12:38, Andres Freund ha scritto:
> >>On December 11, 2014 9:56:09 AM CET, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> wrote:
> >>>On 12/11/2014 05:45 AM, Andres Freund wrote:
> >>>
> >>>Yeah. I was not able to reproduce this, but I'm clearly missing
> >>>something, since both you and Sergey have seen this happening. Can you
> >>>write a script to reproduce?
> >>
> >>Not right now, I only have my mobile... Its quite easy though. Create a pg-basebackup from a standby. Create a recovery.conf with a broken primary conninfo. Start. Shutdown. Fix conninfo. Start.
> >>
> >
> >Just tested it. There steps are not sufficient to reproduce the issue on
> >a test installation. I suppose because, on small test datadir, the
> >checkpoint location and the redo location on the pg_control are the same
> >present in the backup_label.
> >
> >To trigger this bug you need to have at least a restartpoint happened on
> >standby between the start and the end of the backup.
> >
> >you could simulate it issuing a checkpoint on master, a checkpoint on
> >standby (to force a restartpoint), then copying the pg_control from the
> >standby.
> >
> >This way I've been able to reproduce it.
>
> Ok, got it. I was able to reproduce this by using pg_basebackup
> --max-rate=1024, and issuing "CHECKPOINT" in the standby while the backup
> was running.
FWIW, I can reproduce it without any such hangups. I've just tested it
using my local scripts:
# create primary
$ reinit-pg-dev-master
$ run-pg-dev-master
# create first standby
$ reinit-pg-dev-master-standby
$ run-pg-dev-master-standby
# create 2nd standby
$ pg_basebackup -h /tmp/ -p 5441 -D /tmp/tree --write-recovery-conf
$ PGHOST=frakbar run-pg-dev-master-standby -D /tmp/tree
LOG: creating missing WAL directory "pg_xlog/archive_status"
LOG: entering standby mode
FATAL: could not connect to the primary server: could not translate host name "frakbar" to address: Name or service not known
$ PGHOST=/tmp run-pg-dev-master-standby -D /tmp/tree
LOG: started streaming WAL from primary at 0/2000000 on timeline 1
FATAL: backup_label contains data inconsistent with control file
HINT: This means that the backup is corrupted and you will have to use another backup for recovery.
After the fix I just pushed that sequence works.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Fujii Masao | 2014-12-18 08:34:59 | Re: Minor improvement to explain.c |
Previous Message | Fujii Masao | 2014-12-18 08:27:04 | Re: [REVIEW] Re: Compression of full-page-writes |