Re: psql: FATAL: the database system is starting up

From: Tom K <tomkcpr(at)gmail(dot)com>
To: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: psql: FATAL: the database system is starting up
Date: 2019-06-01 19:42:42
Message-ID: CAE3EmBC0MAz7JetpL=JzCL9unBSUko6Q0HKKwFArLb6gXMfEhw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Sat, Jun 1, 2019 at 3:32 PM Tom K <tomkcpr(at)gmail(dot)com> wrote:

>
>
> On Sat, Jun 1, 2019 at 9:55 AM Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
> wrote:
>
>> On 5/31/19 7:53 PM, Tom K wrote:
>> >
>>
>> > There are two places to connect with the Patroni community: on
>> github,
>> > via Issues and PRs, and on channel #patroni in the PostgreSQL
>> Slack. If
>> > you're using Patroni, or just interested, please join us.
>> >
>> >
>> > Will post there as well. Thank you. My thinking was to post here
>> first
>> > since I suspect the Patroni community will simply refer me back here
>> > given that the PostgreSQL errors are originating directly from
>> PostgreSQL.
>> >
>> >
>> > That being said, can you start the copied Postgres instance without
>> > using the Patroni instrumentation?
>> >
>> >
>> > Yes, that is something I have been trying to do actually. But I hit a
>> > dead end with the three errors above.
>> >
>> > So what I did is to copy a single node's backed up copy of the data
>> > files to */data/patroni* of the same node ( this is the psql data
>> > directory as defined through patroni ) of the same node then ran this (
>> > psql03 = 192.168.0.118 ):
>> >
>> > # sudo su - postgres
>> > $ /usr/pgsql-10/bin/postgres -D /data/patroni
>> > --config-file=/data/patroni/postgresql.conf
>> > --listen_addresses=192.168.0.118 --max_worker_processes=8
>> > --max_locks_per_transaction=64 --wal_level=replica
>> > --track_commit_timestamp=off --max_prepared_transactions=0 --port=5432
>> > --max_replication_slots=10 --max_connections=100 --hot_standby=on
>> > --cluster_name=postgres --wal_log_hints=on --max_wal_senders=10 -d 5
>>
>> Why all the options?
>> That should be covered in postgresql.conf, no?
>>
>> >
>> > This resulted in one of the 3 messages above. Hence the post here. If
>> > I can start a single instance, I should be fine since I could then 1)
>> > replicate over to the other two or 2) simply take a dump, reinitialize
>> > all the databases then restore the dump.
>> >
>>
>> What if you move the recovery.conf file out?
>
>
> Will try.
>
>
>>
>> The below looks like missing/corrupted/incorrect files. Hard to tell
>> without knowing what Patroni did?
>
>
> Storage disappeared from underneath these clusters. The OS was of course
> still in memory making futile attempts to write to disk, which would never
> complete.
>
> My best guess is that Patroni or postgress was in the middle of some
> writes across the clusters when the failure occurred.
>

Of note are the characters f2W below. I see nothing in the postgres source
code to indicate this is any recognizable postgres message. A part of me
suspects that the postgres binaries got corrupted. Had this case occur
with glib-common and a reinstall fixed it. However the postgres binaries
csum matches a standalone install perfectly so that should not be an issue.

>
>>
>> > Using the above procedure I get one of three error messages when using
>> > the data files of each node:
>> >
>> > [ PSQL01 ]
>> > postgres: postgres: startup process waiting for 000000010000000000000008
>> >
>> > [ PSQL02 ]
>> > PANIC:replicationcheckpointhas wrong magic 0 instead of 307747550
>> >
>> > [ PSQL03 }
>> > FATAL:syntax error inhistory file:f2W
>> >
>> > And I can't start any one of them.
>> >
>> >
>> >
>> > >
>> > > Thx,
>> > > TK
>> > >
>> >
>> >
>> >
>> > --
>> > Adrian Klaver
>> > adrian(dot)klaver(at)aklaver(dot)com <mailto:adrian(dot)klaver(at)aklaver(dot)com>
>> >
>>
>>
>> --
>> Adrian Klaver
>> adrian(dot)klaver(at)aklaver(dot)com
>>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2019-06-01 20:11:17 Re: psql: FATAL: the database system is starting up
Previous Message Tom K 2019-06-01 19:32:55 Re: psql: FATAL: the database system is starting up