From: | "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> |
---|---|
To: | "Hans-Juergen Schoenig" <postgres(at)cybertec(dot)at> |
Cc: | <pgsql-patches(at)postgresql(dot)org> |
Subject: | Re: Endless recovery |
Date: | 2008-02-11 09:26:08 |
Message-ID: | 47B014B0.3010400@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-patches |
Hans-Juergen Schoenig wrote:
> Last week we have seen a problem with some horribly configured machine.
> The disk filled up (bad FSM ;) ) and once this happened the sysadmi killed the
> system (-9).
> After two days PostgreSQL has still not started up and they tried to restart it
> again and again making sure that the consistency check was started over an over
> again (thus causing more and more downtime).
> From the admi point of view there was no way to find out whether the machine
> was actually dead or still recovering.
>
> Here is a small patch which issues a log message indicating that the recovery
> process can take ages.
> Maybe this can prevent some admis from interrupting the recovery process.
Wait, are you saying that the time was spent in the rm_cleanup phase?
That sounds unbelievable. Surely the time was spent in the redo phase, no?
> In our case, the recovery process took 3.5 days !!
That's a ridiculously long time. Was this a normal recovery, not a PITR
archive recovery? Any idea why the recovery took so long? Given the max.
checkpoint timeout of 1h, I would expect that the recovery would take a
maximum of few hours even with an extremely write-heavy workload.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Hans-Juergen Schoenig | 2008-02-11 09:44:20 | Re: Endless recovery |
Previous Message | Hans-Juergen Schoenig | 2008-02-11 08:29:39 | Endless recovery |