Quick Links

Re: 'replication checkpoint has wrong magic' on the newly cloned replicas

From:	Alex Kliukin <alexk(at)hintbits(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	pgsql-admin(at)postgresql(dot)org
Subject:	Re: 'replication checkpoint has wrong magic' on the newly cloned replicas
Date:	2017-11-30 10:44:29
Message-ID:	1512038669.1366070.1189170512.3D900EE2@webmail.messagingengine.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-admin

On Thu, Nov 30, 2017, at 01:41, Andres Freund wrote:
>
> > It is part of replication origins feature, which is fairly new stuff
> > (see src/backend/replication/logical/origin.c). I'd bet this problem
> > is related to a bug in logical replication "origins" code rather than
> > any procedural problems in your base-backup taking setup ...
>
> Possible.
>
> What's the max_replication_origins setting? Is the system receiving
> logical replication data? Could you describe the setup a bit? Any chance
> the system's partially been running without fsync? Could you attach both
> a corrupt and a non-corrupt state file?

max_replication_slots is 5 and logical replication is not used
altogether there. fsync is always turned on, the other configuration
settings from the master are attached.

The replica configuration is almost identical to the master (we
decreased random_page_costs for systems running on SSDs).

diff /tmp/settings_master.txt /tmp/settings_replica.txt
115c115
< krb_server_keyfile FILE:/server/postgres/9.6.5/etc/krb5.keytab
---
> krb_server_keyfile FILE:/server/postgres/9.6.6/etc/krb5.keytab
186c186
< random_page_cost 3
---
> random_page_cost 1.5
194,195c194,195
< server_version 9.6.5
< server_version_num 90605
---
> server_version 9.6.6
> server_version_num 90606
222c222
< tcp_keepalives_interval 75
---
> tcp_keepalives_interval 90
239c239
< transaction_read_only off
---
> transaction_read_only on
273c273

The system is a typical OLTP, the master normally has a single streaming
physical replica and one delayed one. At the time the issue happened the
replica in question was the second physical replica, after it has been
created the previous replica has been decommissioned.

Unfortunately, I don't have a 'corrupt' file from the replica, as the
data has been reinitialized afterwards. I will try to reproduce the
issue by cloning it couple more times. The replorigin_checkpoint from
the master is attached, but its magic seems to be fine:

od -x replorigin_checkpoint
0000000 dade 1257 b236 6a00
0000010

The same file from the current replica is identical.

--
Sincerely,
Alex

Attachment	Content-Type	Size
replorigin_checkpoint	application/octet-stream	8 bytes
settings_master.txt	text/plain	6.5 KB

In response to

Re: 'replication checkpoint has wrong magic' on the newly cloned replicas at 2017-11-30 00:41:07 from Andres Freund

Responses

Re: 'replication checkpoint has wrong magic' on the newly cloned replicas at 2017-11-30 15:36:07 from Andres Freund

Browse pgsql-admin by date

	From	Date	Subject
Next Message	Andres Freund	2017-11-30 15:36:07	Re: 'replication checkpoint has wrong magic' on the newly cloned replicas
Previous Message	Alex Kliukin	2017-11-30 09:23:28	Re: 'replication checkpoint has wrong magic' on the newly cloned replicas