Trying to handle db corruption 9.6

From: Mariel Cherkassky <mariel(dot)cherkassky(at)gmail(dot)com>
To: PostgreSQL mailing lists <pgsql-performance(at)postgresql(dot)org>, pgsql-admin(at)lists(dot)postgresql(dot)org
Subject: Trying to handle db corruption 9.6
Date: 2019-05-20 14:39:48
Message-ID: CA+t6e1=AHht9vqazjB8s15KH2Y7Vs1br3AVZpU5v3fJZ-Eg9Pw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-performance

Hey,
I'm trying to handle a corruption that one of our customers is facing.
His disk space was full and as a result of that he decided to run
pg_resetxlog a few times(bad idea..) .
When I connected to the machine I saw that the db was down.
When I started the db (service postgresql start) I saw the next error in
the logs :

DETAIL: Could not open file "pg_multixact/offsets/0000": No such file or
directory.

The pg_multixact/offset dir contained one file (0025).
The pg_multixact/members dir contains 2 files : 0000 and 0001.

I tried to follow the documentation of pg_resetxlog, and run pg_resetxlog
with -m 0xF0A604,0xEA50CE which are 0025*65536 and 0026*65536 in hexa.
However, it didnt help and the same error appeared.
So I tried to rename the file to 0000 and then the db searched for a file
in members that wasnt exist.
I followed the documentation and changed the multitransaction offset (-O)
and the transactions id (-c ) based on the doc and then the db was started
succesfully.
However after it started I saw the next msg in the logs :
Multixact member wraparound protections are disabled because oldest
checkpointed Multixact 65536 doesnt exist. In addition, no one is able to
connect to the db (we keep getting errors database doesnt exist or user
doesnt exist , even for postgresql user).

current relevant rows from the control data :

pg_control version number: 960

Catalog version number: 201608131

Database system identifier: 6692952810876880414

Database cluster state: shut down

pg_control last modified: Mon 20 May 2019 07:07:30 AM PDT

Latest checkpoint location: 1837/E3000028

Prior checkpoint location: 1837/E2000028

Latest checkpoint's REDO location: 1837/E3000028

Latest checkpoint's REDO WAL file: 0000000100001837000000E3

Latest checkpoint's TimeLineID: 1

Latest checkpoint's PrevTimeLineID: 1

Latest checkpoint's full_page_writes: on

Latest checkpoint's NextXID: 0:3

Latest checkpoint's NextOID: 10000

Latest checkpoint's NextMultiXactId: 131072

Latest checkpoint's NextMultiOffset: 52352

Latest checkpoint's oldestXID: 3

Latest checkpoint's oldestXID's DB: 0

Latest checkpoint's oldestActiveXID: 0

Latest checkpoint's oldestMultiXid: 65536

Latest checkpoint's oldestMulti's DB: 0

Latest checkpoint's oldestCommitTsXid:4604

Latest checkpoint's newestCommitTsXid:5041

I also checked and I saw that the customer has all the wals (backed up) but
without any basebackup..
Any recommendations how to handle the case ?

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Flo Rance 2019-05-20 14:49:35 Re: Trying to handle db corruption 9.6
Previous Message Mahmoud Moharam 2019-05-16 06:04:01 Postgres Eating cpu

Browse pgsql-performance by date

  From Date Subject
Next Message Flo Rance 2019-05-20 14:49:35 Re: Trying to handle db corruption 9.6
Previous Message Tom Lane 2019-05-17 13:35:35 Re: Analyze results in more expensive query plan