Re: Checking for missing heap/index files

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Checking for missing heap/index files
Date: 2022-06-13 20:06:12
Message-ID: CA+TgmoZA-rT7E_iY+iZLqQpQf0Ps8-svg1ttKDm25CzOUKDBVg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jun 8, 2022 at 8:46 AM Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> We currently can check for missing heap/index files by comparing
> pg_class with the database directory files. However, I am not clear if
> this is safe during concurrent DDL. I assume we create the file before
> the update to pg_class is visible, but do we always delete the file
> after the update to pg_class is visible? I assume any external checking
> tool would need to lock the relation to prevent concurrent DDL.

If you see an entry in pg_class, then there should definitely be a
file present on disk. The reverse is not true: just because you don't
see an entry in pg_class for a file that's on disk doesn't mean it's
safe to remove that file.

> Also, how would it check if the number of extents is correct? Seems we
> would need this value to be in pg_class, and have the same update
> protections outlined above. Seems that would require heavier locking.

Yeah, and it's not just the number of extents but the length of the
last one. If the last extent is supposed to be 700MB and it gets
truncated to 200MB, it would be nice if we could notice that.

One idea might be for each heap table to have a metapage and store the
length - or an upper bound on the length - in the metapage. That'd
probably be cheaper than updating pg_class, but might still be
expensive in some scenarios, and it's a fairly large amount of
engineering.

> Is this something anyone has even needed or had requested?

Definitely. And also the reverse: figuring out which files on disk are
old garbage that can be safely nuked.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2022-06-13 20:21:17 Re: pltcl crash on recent macOS
Previous Message Robert Haas 2022-06-13 19:55:51 Re: Finer grain log timestamps