Re: 'replication checkpoint has wrong magic' on the newly cloned replicas

From: Alex Kliukin <alexk(at)hintbits(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: 'replication checkpoint has wrong magic' on the newly cloned replicas
Date: 2017-11-30 09:23:28
Message-ID: 1512033808.1345812.1189147312.07BDA54D@webmail.messagingengine.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin


On Thu, Nov 30, 2017, at 00:22, Alvaro Herrera wrote:
> Alex Kliukin wrote:
>
> > 2017-11-15 13:15:46.673 CET,,,15154,,5a0c2ff1.3b32,5,,2017-11-15
> > 13:15:45 CET,,0,PANIC,XX000,"replication checkpoint has wrong magic
> > 5714534 instead of 307747550",,,,,,,,,""
>
> Uhh ... I had never heard of this "replication checkpoint" thing. It is
> part of replication origins feature, which is fairly new stuff (see
> src/backend/replication/logical/origin.c). I'd bet this problem is
> related to a bug in logical replication "origins" code rather than any
> procedural problems in your base-backup taking setup ...

We are not using logical replication or logical decoding on those hosts.
On the master, pg_replication_origin is empty as well as
pg_replication_slots

Those masters were fairly recently (around 2 months ago) upgraded from
9.3.

>
> I wonder if there is some truncation of the 0x1257DADE value that
> produces the 5714534 value you're seeing -- something related to a
> pg_logical/replorigin_checkpoint file being written partially while the
> backup is being taken.

307747550 = 0x1257DADE
0001 0010 0101 0111 1101 1010 1101 1110

5714534 = 0x573266 = w2f ASCII
0000 0000 0101 0111 0011 0010 0110 0110

I see no patterns here.

What is interesting is that 0x573266 is actually mentioned in relcache.c

#define RELCACHE_INIT_FILENAME "pg_internal.init"
#define RELCACHE_INIT_FILEMAGIC 0x573266 /* version ID
value */

it's a file magic for the relcache init files, but given that the copy
is performed by just compressing and decompressing the original files I
don't see how those 2 could be confused by software.

>
> Another point towards not including pg_logical/ contents when taking a
> base backup, I guess ...

In our case wouldn't it just mask the real issue?
--
Sincerely,
Alex

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Alex Kliukin 2017-11-30 10:44:29 Re: 'replication checkpoint has wrong magic' on the newly cloned replicas
Previous Message Marco Nietz 2017-11-30 06:54:40 Re: Barman WAL size issue