Re: Better HINT message for "unexpected data beyond EOF"

From: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Better HINT message for "unexpected data beyond EOF"
Date: 2025-03-27 09:25:50
Message-ID: CAKZiRmwFoaymHZZedNbdTQhDZNmuoA2JRKOrtjQbG+Y=UBN61g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 26, 2025 at 4:01 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
[..]
> > so how about:
> > -HINT: This has been seen to occur with buggy kernels; consider
> > updating your system.
> > +HINT: This has been observed with files being overwritten, buggy
> > kernels and potentially other external file system influence.
>
> I agree that we should emphasize the possibility of files being
> overwritten.

> I'm not sure we should even mention buggy kernels -- is
> there any evidence that's still a thing on still-running hardware?

No, I do not have any, other than comments in source code from Tom.

> I don't really like "other external file system influence" because that
> sounds like vague weasel-wording.

That was somehow intended, because I did not want to rule out any
external factor(s) and state it as vaguely as possible to stay
generic, because it is literally "paranormal" / "rogue" activity
happening from perspective of the core server itself (another entity
opening and overwriting data files) , but I suppose bugs or in some
cases fs corruption could cause it too ?)

E.g. I've tracked down that e.g. Pavan fixed something in 2ndQ
fast_redo/pg_xlog_prefetch extension in 2016, where some concurrency
bug in that extension was causing similiar problem back then on at
least one occasion: ```...issue was caused because the prefetch worker
process reading back blocks that are being concurrently dropped by the
startup process (as a result of truncate operation). When the startup
process later tries to extend the relation, it finds an existing valid
block in the shared buffers and panics. ``` (sounds like it is related
with data beyond EOF).

Proposals:
1. HINT: This has been observed with files being overwritten.
2. HINT: This has been observed with files being overwritten, old
(2.6.x) buggy Linux kernels .
3. HINT: This has been observed with files being overwritten, old
(2.6.x) buggy Linux kernels, corruption or other non-core PostgreSQL
bugs.
4. HINT: This has been observed with files being overwritten, buggy
kernels and potentially other external file system influence.

TBH, anything else is better that simply avoids blaming kernel folks
directly, but as a non-native speaker I'm finding it a little hard to
articulate.

-J.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2025-03-27 09:31:14 Re: [PATCH] PGSERVICEFILE as part of a normal connection string
Previous Message Jesper Pedersen 2025-03-27 08:52:35 Re: GSoC 2025 - Looking for Beginner-Friendly PostgreSQL Project