From: | Merlin Moncure <mmoncure(at)gmail(dot)com> |
---|---|
To: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: emergency outage requiring database restart |
Date: | 2016-10-18 13:45:55 |
Message-ID: | CAHyXU0wLgMvD_KVJyfZhACBpkfDbPEawkqbx2EObYxMt2O=kMA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Oct 17, 2016 at 2:04 PM, Alvaro Herrera
<alvherre(at)2ndquadrant(dot)com> wrote:
> Merlin Moncure wrote:
>
>> castaging=# CREATE OR REPLACE VIEW vw_ApartmentSample AS
>> castaging-# SELECT ...
>> ERROR: 42809: "pg_cast_oid_index" is an index
>> LINE 11: FROM ApartmentSample s
>> ^
>> LOCATION: heap_openrv_extended, heapam.c:1304
>>
>> should I be restoring from backups?
>
> It's pretty clear to me that you've got catalog corruption here. You
> can try to fix things manually as they emerge, but that sounds like a
> fool's errand.
Yeah. Believe me -- I know the drill. Most or all the damage seemed
to be to the system catalogs with at least two critical tables dropped
or inaccessible in some fashion. A lot of the OIDs seemed to be
pointing at the wrong thing. Couple more datapoints here.
*) This database is OLTP, doing ~ 20 tps avg (but very bursty)
*) Another database on the same cluster was not impacted. However
it's more olap style and may not have been written to during the
outage
Now, this infrastructure running this system is running maybe 100ish
postgres clusters and maybe 1000ish sql server instances with
approximately zero unexplained data corruption issues in the 5 years
I've been here. Having said that, this definitely smells and feels
like something on the infrastructure side. I'll follow up if I have
any useful info.
merlin
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2016-10-18 14:03:39 | Re: Query cancel seems to be broken in master since Oct 17 |
Previous Message | Heikki Linnakangas | 2016-10-18 13:31:00 | Re: Query cancel seems to be broken in master since Oct 17 |