Re: Btree or not btree? That is the question

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Sabino Mullane <greg(at)endpoint(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Btree or not btree? That is the question
Date: 2012-06-21 18:37:16
Message-ID: 29443.1340303836@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greg Sabino Mullane <greg(at)endpoint(dot)com> writes:
>>> ERROR: index "pg_class_oid_index" is not a btree
>> That means you got bogus data while reading the metapage.
>> I'm beginning to wonder about the hardware on this server ...

> This happened again, and this time I went back through
> the logs and found that it is always the exact same query causing
> the issue. I also found it occuring on different servers,
> which rules out RAM anyway (still shared disk, so those are suspect).
> This query also sometimes gives errors like this:
> ERROR: could not read block 3 of relation 1663/1554846571/3925298284:
> read only 0 of 8192 bytes
> However, the final number changes: these are invariably temporary relations.

Oh really ... okay, maybe it is a software problem then.

> Any ideas on how to carefully debug this? There are a couple of quicksorts
> when I explain analyze on a non-prod system, which I am guessing where
> the temp tables come from (work_mem is 24MB).

No, that error message is complaining about an attempt to read an
actual, named, relation (could be temp, no way to be sure from this
info). A sort might generate temp files but those don't have that
kind of name.

> I'm not sure I understand
> what could be causing both the 'read 0' and btree errors for the
> same query - bad blocks on disk for one of the underlying tables?

The "could not read" thing looks more like an attempt to fetch an
invalid TID; you could get such an error for instance if you had an
index that referenced a tuple in block 3, but the table on disk isn't
that long. So one possible theory for what's happening here is that
once in a while we get confused about which shared buffer holds which
disk block, and either find the wrong block entirely when looking for
pg_class_oid_index (the first case) or grab a page of the wrong index
in the second case. This theory would be more plausible if you're
wrong about the second-case tables being temp, though, because if they
are temp then their indexes would be kept in local buffers not shared
buffers, making it much harder to believe in a single bug causing both
symptoms.

One possible mechanism for confusion of that sort would be if the
spinlock code wasn't quite right, or the compiler was incorrectly
moving loads/stores into or out of locked sections. So it might be
time to ask exactly what kind of hardware this is, which compiler
PG was built with, etc.

On the other hand, if the issue were of that sort then it ought to
affect all buffers more or less at random; so if you're consistently
seeing exactly these symptoms (in particular, if it's always
pg_class_oid_index that's complained of), then I'm not sure I believe
this theory either.

Which PG version again? Are you in the habit of doing VACUUM FULLs
on system catalogs, and if so do these glitches correlate at all
with such activities?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2012-06-21 18:53:43 Re: Incorrect behaviour when using a GiST index on points
Previous Message Jaime Casanova 2012-06-21 18:13:24 Re: [v9.3] Extra Daemons (Re: elegant and effective way for running jobs inside a database)