Re: Recovery Assistance

From: Brian Mills <brian(at)trybooking(dot)com>
To: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Recovery Assistance
Date: 2017-01-28 01:40:20
Message-ID: CAK+gLv9d=0tsVDFJsBUX7NTtKg3F=x7eVrC-g4sE0N5+9z-n5w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

First of all, Thank you for your time to assist me learning. I really
appreciate it.

root# ps ax | grep post
1364 ? Ss 0:00 /usr/lib/postfix/master
5198 pts/3 S 0:00 su postgres
5221 pts/3 S 0:00 /usr/lib/postgresql/9.3/bin/postgres -D
/etc/postgresql/9.3/main
5222 ? Ss 0:10 postgres: startup process recovering
0000000100000005000000A3
7930 pts/4 S+ 0:00 grep --color=auto post

Its a single machine postgres database server, so I'm assuming there is no
cluster log. If there is one, where would I look for it?
The only log in /var/log/postgres is postgresql-9.3-main.log
which shows (tail):

2017-01-27 20:27:01 AEDT LOG: database system was shut down at 2017-01-27
20:26:29 AEDT
2017-01-27 20:27:01 AEDT LOG: MultiXact member wraparound protections are
now enabled
2017-01-27 20:27:01 AEDT LOG: autovacuum launcher started
2017-01-27 20:27:01 AEDT LOG: database system is ready to accept
connections
2017-01-27 20:27:02 AEDT LOG: incomplete startup packet
2017-01-27 20:28:54 AEDT ERROR: unexpected chunk size 104 (expected 1996)
in chunk 3 of 22 for toast value 48862 in pg_toast_20028
2017-01-27 20:28:54 AEDT STATEMENT: COPY public.bodycontent
(bodycontentid, body, contentid, bodytypeid) TO stdout;
2017-01-27 20:30:13 AEDT LOG: received fast shutdown request
2017-01-27 20:30:13 AEDT LOG: aborting any active transactions
2017-01-27 20:30:13 AEDT LOG: autovacuum launcher shutting down
2017-01-27 20:30:13 AEDT LOG: shutting down
2017-01-27 20:30:13 AEDT LOG: database system is shut down

I ran the screen utility so I could leave the db started using the pg_ctl
command. The later logs in that session have not progressed, its last entry
is still
2017-01-27 23:00:01 AEDT FATAL: the database system is starting up
Which is still later datetime than the /var/log/postgres... log.

Connection attempt shows the same log
postgres(at)atlassian:/home/myuser$ psql
psql: FATAL: the database system is starting up

Nothing in the syslog that seems connected.

*Brian Mills*
CTO

*Mob: *0410660003
*Melbourne* 03 9012 3460 or 03 8376 6327 *|* *Sydney* 02 8064 3600 *|*
*Brisbane* 07 3173 1570
Level 1 *|* 600 Chapel Street *|* South Yarra*|* VIC *|* 3141 *|*
Australia

<https://www.facebook.com/TryBooking/> <https://twitter.com/trybooking>
<https://www.linkedin.com/company/trybooking-com>

On 28 January 2017 at 12:05, Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
wrote:

> On 01/27/2017 01:31 PM, Brian Mills wrote:
>
>> Hi,
>>
>> I have a Atlassian Confluence Wiki that depends on postgres, but I
>> haven't much experience with postgres other than for this purpose.
>>
>> A few days ago, the hard disk filled, so all services stopped working.
>> When the admin realised this he increased the disk size (its in a cloud,
>> so that was easy to do) however I think the way it shutdown left
>> Postgres in an inconsistent state for some reason.
>> Now when I start it up I get this message in the log over and over again:
>> "FATAL: the database system is starting up"
>>
>> I have a backup, which I have successfully recovered, but it is 24 hours
>> old, the next backup was the cause of the disk filling. So I'm using
>> this as exercise in learning a bit more about postgres.
>>
>> I did some research and found a number of options. I took a file level
>> backup with the service not running then tried 2 things. I haven't found
>> much else on what to do though.
>>
>> *Attempt 1 - Reset Log *
>>
>> It sounded like this shouldn't be my first option (it wasn't) but it did
>> sound like what I needed to do.
>> I ran this command
>> ./pg_resetxlog /var/lib/postgresql/9.3/main -f
>> It worked a treat, the database did startup ok.
>> However when I tried to dump the DB:
>> root(at)atlassian:/home/myuser# sudo -u postgres pg_dump confluencedb >
>> $now-confluencedb.sql
>> pg_dump: Dumping the contents of table "bodycontent" failed:
>> PQgetResult() failed.
>> pg_dump: Error message from server: ERROR: unexpected chunk size 104
>> (expected 1996) in chunk 3 of 22 for toast value 48862 in pg_toast_20028
>> pg_dump: The command was: COPY public.bodycontent (bodycontentid, body,
>> contentid, bodytypeid) TO stdout;
>> The dump failed, so I assume this did leave my database in an
>> inconsistent state.
>>
>>
>> *Attempt 2 - startup manually and let it try recovery*
>>
>> I restored my file level backup and started again.
>> This time I tried to startup manually on the command line to see the
>> output (I'd done it as a service startup a number of times to nearly the
>> same effect too)
>>
>> postgres(at)atlassian:/usr/lib/postgresql/9.3/bin$ ./pg_ctl -D
>> /etc/postgresql/9.3/main start
>> server starting
>> postgres(at)atlassian:/usr/lib/postgresql/9.3/bin$ 2017-01-27 20:36:08 AEDT
>> LOG: database system was interrupted while in recovery at 2017-01-27
>> 20:13:26 AEDT
>> 2017-01-27 20:36:08 AEDT HINT: This probably means that some data is
>> corrupted and you will have to use the last backup for recovery.
>> 2017-01-27 20:36:08 AEDT LOG: database system was not properly shut
>> down; automatic recovery in progress
>> 2017-01-27 20:36:08 AEDT LOG: redo starts at 5/528B4558
>> 2017-01-27 20:36:18 AEDT LOG: redo done at 5/A3FFF9E8
>> 2017-01-27 20:36:18 AEDT LOG: last completed transaction was at log
>> time 2017-01-24 02:08:00.023064+11
>> 2017-01-27 23:00:01 AEDT FATAL: the database system is starting up
>> 2017-01-27 23:00:01 AEDT FATAL: the database system is starting up
>>
>
> What does ps ax | grep post show?
>
> Is the cluster set up to log to a file, if so what does it show?
>
> Does the system log show anything relevant?
>
>
>> I've left it that way for 12 hours, and its still not allowing
>> connections.
>> I assume its doing some kind of consistency check?
>>
>
> What does it say when you attempt a connection?
>
>
>
>> Does anyone have any suggestions on what I should be doing to try and
>> restore this database?
>>
>> - The amount of change is minimal in the DB after 6pm it should be
>> basically no change overnight.
>> - The log above seems to suggest it has completed a redo ok, is that
>> right?
>> - The last completed transaction time 2017-01-24 02:08:00.023064+11 the
>> log mentions would be fine to use, so loosing even a few hours before
>> that would be more than adequate.
>>
>> I'm just not clear what the database is doing now, or how I should be
>> trying to recover it.
>>
>> Any help anyone can suggest would be great! I've given myself this
>> weekend to attempt to recover more than the last backup, after that I
>> need to restore the service for my team to use and suck up the lost last
>> day of updates.
>>
>> Thanks,
>> Brian
>>
>
>
> --
> Adrian Klaver
> adrian(dot)klaver(at)aklaver(dot)com
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message bym 2017-01-28 04:30:12 WindowAgg optimized out of subquery but not out of CTE, generating different results.
Previous Message Adrian Klaver 2017-01-28 01:05:26 Re: Recovery Assistance