From: | Merlin Moncure <mmoncure(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: emergency outage requiring database restart |
Date: | 2016-10-26 17:43:29 |
Message-ID: | CAHyXU0z7O7TqXOk9mj7A9KxT6r5aczjCBZQbvgP7vsrjfDeH6g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Oct 26, 2016 at 11:35 AM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
> On Tue, Oct 25, 2016 at 3:08 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
>> Confirmation of problem re-occurrence will come in a few days. I'm
>> much more likely to believe 6+sigma occurrence (storage, freak bug,
>> etc) should it prove the problem goes away post rebuild.
>
> ok, no major reported outage yet, but just got:
>
> 2016-10-26 11:27:55 CDT [postgres(at)castaging]: ERROR: invalid page in
> block 12 of relation base/203883/1259
*) still on 9.5.2 :( not in control of the package update process
*) getting more of these:
[root(at)rcdylsdbmpf001 data]# cat
/var/lib/pgsql/9.5/data_checksum/pg_log/postgresql-26.log | grep
"invalid page"
2016-10-26 11:26:42 CDT [postgres(at)castaging]: ERROR: invalid page in
block 10 of relation base/203883/1259
2016-10-26 11:27:55 CDT [postgres(at)castaging]: ERROR: invalid page in
block 12 of relation base/203883/1259
2016-10-26 12:16:44 CDT [postgres(at)castaging]: ERROR: invalid page in
block 13 of relation base/203883/1259
2016-10-26 12:18:58 CDT [postgres(at)castaging]: ERROR: invalid page in
block 15 of relation base/203883/1259
2016-10-26 12:19:12 CDT [postgres(at)castaging]: ERROR: invalid page in
block 7 of relation base/203883/2662
castaging=# select relname from pg_class where oid in(1259, 2662);
relname
────────────────────
pg_class_oid_index
pg_class
*) only one database in the cluster is damaged. I do not suspect
storage at this point
*) Currently I can execute pg_dump with no problems. So far impact
is not severe but soon I have to do dump/restore
*) age(xmin) from pg_class gives all reasonable values (approx 757k)
that aren't 2147483647
*) All cases of statement executing getting this error in a pl/pgsql
routine that does the following:
BEGIN
...
<several postgres_fdw querying same 1 record 1 column tabel>
DROP TABLE IF EXISTS foo;
CREATE TEMP TABLE foo ON COMMIT DROP AS ...
SELECT <sqsh function push sql script to sql sever> ... INTO Result;
DROP TABLE IF EXISTS bar;
CREATE TEMP TABLE bar ON COMMIT DROP AS ...
PERFORM <sqsh function removing script>;
EXCEPTION
WHEN OTHERS THEN
PERFORM <sqsh function removing script>;
RAISE;
END;
*) page verification miss is numerically close
[root(at)rcdylsdbmpf001 ~]# cat
/var/lib/pgsql/9.5/data/pg_log/postgresql-26.log | grep "page
verification"
2016-10-26 11:26:42 CDT [postgres(at)castaging]: WARNING: page
verification failed, calculated checksum 37251 but expected 37244
2016-10-26 11:27:55 CDT [postgres(at)castaging]: WARNING: page
verification failed, calculated checksum 37249 but expected 37244
2016-10-26 12:16:44 CDT [postgres(at)castaging]: WARNING: page
verification failed, calculated checksum 44363 but expected 44364
2016-10-26 12:18:58 CDT [postgres(at)castaging]: WARNING: page
verification failed, calculated checksum 49525 but expected 49539
2016-10-26 12:19:12 CDT [postgres(at)castaging]: WARNING: page
verification failed, calculated checksum 37345 but expected 37340
*) sample log leading into the first error (all queries are logged)
2016-10-26 11:26:40 CDT [postgres(at)castaging]: LOG: execute <unnamed>:
SELECT PushMarketSample($1::TEXT) AS published
2016-10-26 11:26:40 CDT [postgres(at)castaging]: DETAIL: parameters: $1 = '8840'
2016-10-26 11:26:40 CDT [postgres(at)mpf2]: LOG: statement: START
TRANSACTION ISOLATION LEVEL REPEATABLE READ
2016-10-26 11:26:40 CDT [postgres(at)mpf2]: LOG: statement: SAVEPOINT s2
2016-10-26 11:26:40 CDT [postgres(at)mpf2]: LOG: execute <unnamed>:
DECLARE c1 CURSOR FOR
SELECT period FROM public.dataentryperiod
2016-10-26 11:26:40 CDT [postgres(at)mpf2]: LOG: statement: FETCH 100 FROM c1
2016-10-26 11:26:41 CDT [postgres(at)mpf2]: LOG: statement: CLOSE c1
2016-10-26 11:26:41 CDT [postgres(at)mpf2]: LOG: execute <unnamed>:
DECLARE c2 CURSOR FOR
SELECT period FROM public.dataentryperiod
2016-10-26 11:26:41 CDT [postgres(at)mpf2]: LOG: statement: FETCH 100 FROM c2
2016-10-26 11:26:41 CDT [postgres(at)mpf2]: LOG: statement: CLOSE c2
2016-10-26 11:26:41 CDT [rms(at)mpf2]: ERROR: relation "tblagent" does
not exist at character 15
2016-10-26 11:26:41 CDT [rms(at)mpf2]: STATEMENT: select * from tblagent
2016-10-26 11:26:41 CDT [rms(at)mpf2]: LOG: execute <unnamed>: SELECT
CURRENT_SCHEMA()
2016-10-26 11:26:42 CDT [postgres(at)mpf2]: LOG: execute <unnamed>:
DECLARE c3 CURSOR FOR
SELECT period FROM public.dataentryperiod
2016-10-26 11:26:42 CDT [postgres(at)mpf2]: LOG: statement: FETCH 100 FROM c3
2016-10-26 11:26:42 CDT [postgres(at)mpf2]: LOG: statement: CLOSE c3
2016-10-26 11:26:42 CDT [postgres(at)mpf2]: LOG: execute <unnamed>:
DECLARE c4 CURSOR FOR
SELECT period FROM public.dataentryperiod
2016-10-26 11:26:42 CDT [postgres(at)mpf2]: LOG: statement: FETCH 100 FROM c4
2016-10-26 11:26:42 CDT [postgres(at)mpf2]: LOG: statement: CLOSE c4
2016-10-26 11:26:42 CDT [postgres(at)castaging]: WARNING: page
verification failed, calculated checksum 37251 but expected 37244
2016-10-26 11:26:42 CDT [postgres(at)castaging]: CONTEXT: SQL statement
"CREATE TEMP TABLE ApartmentPublishBySubmkt ON COMMIT DROP AS
SELECT
ApartmentId,
TotalUnits,
SubmarketId,
SUM(Rent * UnitCount) / NULLIF(SUM(UnitCount), 0) AS Rent,
SUM(Occupancy * UnitCount) / NULLIF(SUM(UnitCount), 0) AS Occupancy
FROM AptSample
GROUP BY 1, 2, 3"
PL/pgSQL function pushmarketsample(text,date,integer) line 103
at SQL statement
2016-10-26 11:26:42 CDT [postgres(at)mpf2]: LOG: statement: ROLLBACK TO
SAVEPOINT s2; RELEASE SAVEPOINT s2
2016-10-26 11:26:42 CDT [postgres(at)castaging]: ERROR: invalid page in
block 10 of relation base/203883/1259
2016-10-26 11:26:42 CDT [postgres(at)castaging]: CONTEXT: SQL statement
"CREATE TEMP TABLE ApartmentPublishBySubmkt ON COMMIT DROP AS
SELECT
ApartmentId,
TotalUnits,
SubmarketId,
SUM(Rent * UnitCount) / NULLIF(SUM(UnitCount), 0) AS Rent,
merlin
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2016-10-26 17:49:58 | Re: Improving RLS planning |
Previous Message | Robert Haas | 2016-10-26 17:39:01 | Re: Improving RLS planning |