Re: What could cause CREATE TEMP... "could not read block" error?

From: Chris Richards <chris(at)infinite(dot)io>
To: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: What could cause CREATE TEMP... "could not read block" error?
Date: 2015-11-24 20:48:46
Message-ID: CAOan6T=BS0YTt-+HpHgGZr_h9+2P8cWheKyg_NmHyOKaGpu9KQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

> Are you running the latest minor release for your PostgreSQL version?

No. 9.3.9 whereas the lastest is 9.3.10

> Were there any crashes recently?

No[*]. See comments below about LVM snapshots.

> Do you have "fsync = on"?

Yes.

> Did you make sure that you have a reliable storage system?

RAID-6 SSDs on ext4

I've observed the problem on other than the "pg_depend" table. A normal
table (called "blocks") had the problem. A "REINDEX TABLE blocks" did not
fix the problem, however a "VACUUM FULL" did "fix" it.

We had thought we figured out what might have caused the problem. We were
taking LVM snapshots while the database was running and then at sometime
later we reverted the snapshot (while postgres was stopped). Our theory was
that since postgres was running and therefore we captured its backing store
in an indeterminate state. We changed our snapshot process to shutdown
postgres, take the LVM snapshot, and then start postgres.

Unfortunately, the problem cropped up again.

We had an empty database except for our created tables, took the snapshot,
and then populated the tables using normal INSERT/UPDATE/DELETEs while the
system ran. Then, we reboot and revert the LVM snapshot--back to the empty
database. We then populated the tables by using a series "COPY <table> FROM
<file>". The only things special about this is we dropped one foreign key
constraint and that with our data files is that we may issue back-to-back
COPY's to the same table (each COPY pulling in different data as we
pre-process the data before COPY'ing it). The amount of data is relatively
small from our COPY's:

dirtable - 1 row
dirpath - 1 row
cloud - 940 rows
blocks - 176 rows (before it died)

Indexes were not disabled during the bulk import.

Chris

On Mon, Nov 23, 2015 at 2:59 AM, Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
wrote:

> Chris Richards wrote:
> > Adrian is correct. This worked by itself whereas using it in the
> creation of the temporary table
> > failed.
> > mdb-> SELECT pq.* FROM policyqueue AS pq
> > mdb-> JOIN seed_progress AS sp ON pq.id <http://pq.id/> =sp.polidx;
> >
> > I checked the query Albe suggested; there were two `relfilenode`s (11936
> and 11937) that exhibited the
> > error. Respectively, they were pg_depend_depender_index and
> pg_depend_reference_index.
> >
> > Unfortunately, I didn't disable the nightly processes and something
> must have(?) fixed the glitch; at
> > midnight GMT the query ran successfully. Ugh.
> >
> > If it crops up again, I have some tools to try and capture data
> immediately, and the suggested REINDEX
> > since both appear to be indices.
>
> These indexes are both on the system table "pg_depend".
>
> That explains why the query ran without problems but the CREATE TABLE
> didn't:
> Creating an object creates dependencies, and you have data corruption in
> the
> system table that tracks dependencies.
>
> I would be pretty worried in your place if I had a corrupted catalog, even
> if
> it seems to have "fixed itself". There might be other tables with
> corruption.
>
> I would do two things:
> 1) Wait for a moment when there is little database traffic and
> run "REINDEX TABLE pg_depend;" to rebuild these indexes.
>
> 2) As soon as you can schedule some downtime, pg_dumpall the database
> cluster, stop the server, throw away the database cluster, create a new
> one with "initdb" and restore the dump into that.
> That will get rid of any lurking data corruption.
> Watch out for error messages during the pg_dumpall!
>
>
> The main question is of course how you got the corruption in the first
> place.
> Are you running the latest minor release for your PostgreSQL version?
> Were there any crashes recently?
> Do you have "fsync = on"?
> Did you make sure that you have a reliable storage system?
>
> Yours,
> Laurenz Albe
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Chris Withers 2015-11-24 21:33:43 "trust" authentication in pg_hba.conf
Previous Message Caleb Meredith 2015-11-24 20:30:11 Can row level security policies also be implemented for views?