Re: Trying to handle db corruption 9.6

From: Flo Rance <trourance(at)gmail(dot)com>
To: Mariel Cherkassky <mariel(dot)cherkassky(at)gmail(dot)com>
Cc: PostgreSQL mailing lists <pgsql-performance(at)postgresql(dot)org>, pgsql-admin(at)lists(dot)postgresql(dot)org
Subject: Re: Trying to handle db corruption 9.6
Date: 2019-05-20 14:49:35
Message-ID: CAHogYcXfPf6gJbXQgkOcRfjL-pJdxApvkzF3YowNz5r_5KMG7w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-performance

Hi,

First of all, as stated in the wiki, you'll need to do a filesystem level
copy of the database files and put them on another drive before attempting
to do anything else !

https://wiki.postgresql.org/wiki/Corruption

regards,
Flo

On Mon, May 20, 2019 at 4:40 PM Mariel Cherkassky <
mariel(dot)cherkassky(at)gmail(dot)com> wrote:

> Hey,
> I'm trying to handle a corruption that one of our customers is facing.
> His disk space was full and as a result of that he decided to run
> pg_resetxlog a few times(bad idea..) .
> When I connected to the machine I saw that the db was down.
> When I started the db (service postgresql start) I saw the next error in
> the logs :
>
> DETAIL: Could not open file "pg_multixact/offsets/0000": No such file or
> directory.
>
> The pg_multixact/offset dir contained one file (0025).
> The pg_multixact/members dir contains 2 files : 0000 and 0001.
>
> I tried to follow the documentation of pg_resetxlog, and run pg_resetxlog
> with -m 0xF0A604,0xEA50CE which are 0025*65536 and 0026*65536 in hexa.
> However, it didnt help and the same error appeared.
> So I tried to rename the file to 0000 and then the db searched for a file
> in members that wasnt exist.
> I followed the documentation and changed the multitransaction offset (-O)
> and the transactions id (-c ) based on the doc and then the db was started
> succesfully.
> However after it started I saw the next msg in the logs :
> Multixact member wraparound protections are disabled because oldest
> checkpointed Multixact 65536 doesnt exist. In addition, no one is able to
> connect to the db (we keep getting errors database doesnt exist or user
> doesnt exist , even for postgresql user).
>
> current relevant rows from the control data :
>
> pg_control version number: 960
>
> Catalog version number: 201608131
>
> Database system identifier: 6692952810876880414
>
> Database cluster state: shut down
>
> pg_control last modified: Mon 20 May 2019 07:07:30 AM PDT
>
> Latest checkpoint location: 1837/E3000028
>
> Prior checkpoint location: 1837/E2000028
>
> Latest checkpoint's REDO location: 1837/E3000028
>
> Latest checkpoint's REDO WAL file: 0000000100001837000000E3
>
> Latest checkpoint's TimeLineID: 1
>
> Latest checkpoint's PrevTimeLineID: 1
>
> Latest checkpoint's full_page_writes: on
>
> Latest checkpoint's NextXID: 0:3
>
> Latest checkpoint's NextOID: 10000
>
> Latest checkpoint's NextMultiXactId: 131072
>
> Latest checkpoint's NextMultiOffset: 52352
>
> Latest checkpoint's oldestXID: 3
>
> Latest checkpoint's oldestXID's DB: 0
>
> Latest checkpoint's oldestActiveXID: 0
>
> Latest checkpoint's oldestMultiXid: 65536
>
> Latest checkpoint's oldestMulti's DB: 0
>
> Latest checkpoint's oldestCommitTsXid:4604
>
> Latest checkpoint's newestCommitTsXid:5041
>
>
>
> I also checked and I saw that the customer has all the wals (backed up)
> but without any basebackup..
> Any recommendations how to handle the case ?
>

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Mariel Cherkassky 2019-05-20 15:00:11 Re: Trying to handle db corruption 9.6
Previous Message Mariel Cherkassky 2019-05-20 14:39:48 Trying to handle db corruption 9.6

Browse pgsql-performance by date

  From Date Subject
Next Message Mariel Cherkassky 2019-05-20 15:00:11 Re: Trying to handle db corruption 9.6
Previous Message Mariel Cherkassky 2019-05-20 14:39:48 Trying to handle db corruption 9.6