Jira database won't start after disk filled up

From: Paul Costello <paulc1217(at)gmail(dot)com>
To: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Jira database won't start after disk filled up
Date: 2018-03-02 21:32:13
Message-ID: CADX_Xgbnx_s3Tzk=mBZwwcYSHkf3DSFaQJaVkgkgRaZAhEpSxg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I have a database that wouldn't start due to the disk filling up back on
1/10, unbeknownst to us until 2/27. This is jira, so it's critical data.
It appears jira was running in memory that entire time.

I needed to run pg_resetxlog -f in order to start the database. It
started, but upon logging in I found the system catalog and some data to be
corrupt.

I was able to run a pg_dumpall on the database and restore it to an
re-initialized cluster. However, there were 3 primary key errors during
the restore, because duplicate data got into the tables.

My hypothesis is that because of the system catalog corruption the primary
key uniqueness was not being enforced. Not sure when this occurred though
1) right after the disk filled up 2) when I ran pg_resetxlog -f or 3) after
I ran pg_resetxlog and before I did the backup. jira was still running
after I got it started and I waited a few hours to do the backup. My guess
is the duplicate data got in there right after the disk filled up on 1/10
though.

We had a snapshot from 1/5 which is restored to production, such as it is.
But, they created another test vm for me to attempt to bring data back to
2/27.

Is there anything I can do short of pg_resetxlog -f to bring this database
back up more safely, and possibly avoid the duplicate data/primary key
errors? It wouldn't start without the force option. Should I simply shut
down jira, try pg_restxlog -f again and do the pg_dumpall immediately?

These are the errors I am currently seeing while trying to start the
database.

2018-03-02 11:01:06 CST LOG: database system was interrupted; last known
up at 2018-01-10 12:19:01 CST
2018-03-02 11:01:06 CST LOG: database system was not properly shut down;
automatic recovery in progress
2018-03-02 11:01:06 CST LOG: redo starts at 36/B8556D58
2018-03-02 11:01:06 CST LOG: incomplete startup packet
2018-03-02 11:01:07 CST FATAL: the database system is starting up
...
2018-03-02 11:01:12 CST LOG: incomplete startup packet
2018-03-02 11:01:29 CST FATAL: the database system is starting up
...
2018-03-02 11:01:30 CST LOG: record with zero length at 36/F754CBD8
2018-03-02 11:01:30 CST LOG: redo done at 36/F754CBA8
2018-03-02 11:01:30 CST LOG: last completed transaction was at log time
2018-02-26 17:55:43.238541-06

Any ideas or thoughts are appreciated.

Paul

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Vick Khera 2018-03-02 21:34:52 Re: Is there a continuous backup for pg ?
Previous Message Gary M 2018-03-02 20:44:52 Re: Is there a continuous backup for pg ?