Re: psql: FATAL: the database system is starting up

From: Tom K <tomkcpr(at)gmail(dot)com>
To: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: psql: FATAL: the database system is starting up
Date: 2019-06-03 04:48:04
Message-ID: CAE3EmBBxQh1_WGXLU_Lx=z3A+UezGT_b+=uusurEeBWX3OLq=A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hey Adrian,

Fixed it.

I saw the post from jebriggs but that didn't work for me so posted here.
Anyway, here's how I resolved it:

When I ran an strace on the postgres startup line, I got this:

open("pg_logical/replorigin_checkpoint", O_RDONLY) = 6
write(2, "2019-06-02 14:50:34.777 EDT [283"..., 1062019-06-02 14:50:34.777
EDT [28362] PANIC: replication checkpoint has wrong magic 0 instead of
307747550
-bash-4.2$

Notice how it's reading *pg_logical/replorigin_checkpoint* even though
wal_level = 'replicate' . It sure looks like it's doing logical
replication. Anyway, I checked the file and saw this:

-bash-4.2$ ls -altri pg_logical/
total 20
67894871 -rw-------. 1 postgres postgres 16384 Oct 29 2018
replorigin_checkpoint
136946383 drwx------. 2 postgres postgres 6 Oct 29 2018 snapshots
204367784 drwx------. 2 postgres postgres 6 Oct 29 2018 mappings
67894870 drwx------. 4 postgres postgres 65 Apr 28 06:06 .
135326272 drwx------. 21 postgres postgres 4096 Jun 2 14:50 ..
-bash-4.2$

Notice the dates and size. All way off. Now I checked the same file on
the other nodes:

[root(at)psql03 pg_logical]# ls -altri
total 8
68994432 drwx------. 2 postgres postgres 6 Oct 29 2018 snapshots
134984156 drwx------. 2 postgres postgres 6 Oct 29 2018 mappings
566745 -rw-------. 1 postgres postgres 8 May 22 01:55
replorigin_checkpoint
566731 drwx------. 4 postgres postgres 65 May 22 01:55 .
89714 drwxr-xr-x. 20 root root 4096 May 22 22:43 ..
[root(at)psql03 pg_logical]#

Dates of the other hosts replorigin_checkpoint are much more recent and
smaller.

So I take the replorigin_checkpoint and copy it over to the former primary
node ( From psql03 to psql02(primary) ) and try a restart.

Everything started up. Replication started to the other postgres nodes
without issues. Verified base/ folder on all nodes to ensure replication
is working. Every table is back to the way it was before this entire
outage.

Adrian, thank you very much for all the help here. It was much
appreciated. I've written this up and posted it at the following two links
so others can benefit and I don't forget.

https://microdevsys.com/wp/panic-replication-checkpoint-has-wrong-magic-0-instead-of-307747550/

And of course a handy way to do backups, which I should have implemented
before this happened:

https://microdevsys.com/wp/postgresql-pull-backup/

Thx,
TK

On Sun, Jun 2, 2019 at 4:48 PM Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
wrote:

> On 6/2/19 11:14 AM, Tom K wrote:
>
> >
> > Nope. wal_level was set to replica, not logical. Unless you mean
> >
> >
> > What was the role of this cluster in the original setup?
> >
> >
> > The cluster was the backend database for a number of applications. The
> > aim was to point applications to a single large cluster instead of a
> > number of small instances of postgres running all over the LAB.
> >
> > If I can get through the above error and get to the DB's and tables, I
> > could just dump them out and reinit the entire DB from the backup.
>
> The only thing I could find is the below:
>
>
> http://www.jebriggs.com/blog/2018/05/postgresql-and-panic-replication-checkpoint-has-wrong-magic-error/
>
> If that does not work I would suggest sending a new post(new subject) to
> the mailing list based on:
>
> replication checkpoint has wrong magic 0 instead of 307747550
>
>
> --
> Adrian Klaver
> adrian(dot)klaver(at)aklaver(dot)com
>
>
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Steven Winfield 2019-06-03 09:28:37 RE: Questions about btree_gin vs btree_gist for low cardinality columns
Previous Message Morris de Oryx 2019-06-03 00:24:15 Re: Questions about btree_gin vs btree_gist for low cardinality columns