From: | Alex Kliukin <alexk(at)hintbits(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> |
Cc: | pgsql-admin(at)postgresql(dot)org |
Subject: | Re: 'replication checkpoint has wrong magic' on the newly cloned replicas |
Date: | 2017-11-30 10:44:29 |
Message-ID: | 1512038669.1366070.1189170512.3D900EE2@webmail.messagingengine.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-admin |
On Thu, Nov 30, 2017, at 01:41, Andres Freund wrote:
>
> > It is part of replication origins feature, which is fairly new stuff
> > (see src/backend/replication/logical/origin.c). I'd bet this problem
> > is related to a bug in logical replication "origins" code rather than
> > any procedural problems in your base-backup taking setup ...
>
> Possible.
>
> What's the max_replication_origins setting? Is the system receiving
> logical replication data? Could you describe the setup a bit? Any chance
> the system's partially been running without fsync? Could you attach both
> a corrupt and a non-corrupt state file?
max_replication_slots is 5 and logical replication is not used
altogether there. fsync is always turned on, the other configuration
settings from the master are attached.
The replica configuration is almost identical to the master (we
decreased random_page_costs for systems running on SSDs).
diff /tmp/settings_master.txt /tmp/settings_replica.txt
115c115
< krb_server_keyfile FILE:/server/postgres/9.6.5/etc/krb5.keytab
---
> krb_server_keyfile FILE:/server/postgres/9.6.6/etc/krb5.keytab
186c186
< random_page_cost 3
---
> random_page_cost 1.5
194,195c194,195
< server_version 9.6.5
< server_version_num 90605
---
> server_version 9.6.6
> server_version_num 90606
222c222
< tcp_keepalives_interval 75
---
> tcp_keepalives_interval 90
239c239
< transaction_read_only off
---
> transaction_read_only on
273c273
The system is a typical OLTP, the master normally has a single streaming
physical replica and one delayed one. At the time the issue happened the
replica in question was the second physical replica, after it has been
created the previous replica has been decommissioned.
Unfortunately, I don't have a 'corrupt' file from the replica, as the
data has been reinitialized afterwards. I will try to reproduce the
issue by cloning it couple more times. The replorigin_checkpoint from
the master is attached, but its magic seems to be fine:
od -x replorigin_checkpoint
0000000 dade 1257 b236 6a00
0000010
The same file from the current replica is identical.
--
Sincerely,
Alex
Attachment | Content-Type | Size |
---|---|---|
replorigin_checkpoint | application/octet-stream | 8 bytes |
settings_master.txt | text/plain | 6.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2017-11-30 15:36:07 | Re: 'replication checkpoint has wrong magic' on the newly cloned replicas |
Previous Message | Alex Kliukin | 2017-11-30 09:23:28 | Re: 'replication checkpoint has wrong magic' on the newly cloned replicas |