Re: Better HINT message for "unexpected data beyond EOF"

From: Andres Freund <andres(at)anarazel(dot)de>
To: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Better HINT message for "unexpected data beyond EOF"
Date: 2025-03-27 14:12:11
Message-ID: gluttro6ro2lsn7mvs6i6ihdhi4futxpgljyhslcvguci2a5rd@xteikqd6ftos
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2025-03-27 10:25:50 +0100, Jakub Wartak wrote:
> On Wed, Mar 26, 2025 at 4:01 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> [..]
> > > so how about:
> > > -HINT: This has been seen to occur with buggy kernels; consider
> > > updating your system.
> > > +HINT: This has been observed with files being overwritten, buggy
> > > kernels and potentially other external file system influence.
> >
> > I agree that we should emphasize the possibility of files being
> > overwritten.
>
> > I'm not sure we should even mention buggy kernels -- is
> > there any evidence that's still a thing on still-running hardware?
>
> No, I do not have any, other than comments in source code from Tom.

FWIW, I'm not sure how much that was ever true. We certainly had our own bugs
that could lead to the error occurring.

> E.g. I've tracked down that e.g. Pavan fixed something in 2ndQ
> fast_redo/pg_xlog_prefetch extension in 2016, where some concurrency
> bug in that extension was causing similiar problem back then on at
> least one occasion: ```...issue was caused because the prefetch worker
> process reading back blocks that are being concurrently dropped by the
> startup process (as a result of truncate operation). When the startup
> process later tries to extend the relation, it finds an existing valid
> block in the shared buffers and panics. ``` (sounds like it is related
> with data beyond EOF).

FWIW that's more generally broken than just this error. You can't just read in
data without holding a lock on a relation, that will cause breakage in all
kinds of ways.

> Proposals:
> 1. HINT: This has been observed with files being overwritten.
> 2. HINT: This has been observed with files being overwritten, old
> (2.6.x) buggy Linux kernels .
> 3. HINT: This has been observed with files being overwritten, old
> (2.6.x) buggy Linux kernels, corruption or other non-core PostgreSQL
> bugs.
> 4. HINT: This has been observed with files being overwritten, buggy
> kernels and potentially other external file system influence.

FWIW, I think we should just drop the HINT. We really have no clue what caused
it and a HINT should imo have at least some value other than "*Shrug*", which
is imo pretty much what these HINTs amount to, if they were a bit more blunt.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Álvaro Herrera 2025-03-27 14:15:25 Re: NOT ENFORCED constraint feature
Previous Message Richard Guo 2025-03-27 14:08:13 Re: Reduce "Var IS [NOT] NULL" quals during constant folding