Re: checkpointer: PANIC: could not fsync file: No such file or directory

From: Andres Freund <andres(at)anarazel(dot)de>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Shawn Debnath <sdn(at)amazon(dot)com>
Subject: Re: checkpointer: PANIC: could not fsync file: No such file or directory
Date: 2024-12-13 20:55:42
Message-ID: nyj4k7yur5t27rtygvx2i2lrlp6rqfvvhoiiwx4fznynksf2et@4hj2sp42alpe
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2019-12-13 17:41:56 +1300, Thomas Munro wrote:
> From 9609c9a153232fb2de169bf76158781d354c633b Mon Sep 17 00:00:00 2001
> From: Thomas Munro <tmunro(at)postgresql(dot)org>
> Date: Fri, 13 Dec 2019 17:12:42 +1300
> Subject: [PATCH] Don't use _mdfd_getseg() in mdsyncfiletag().
>
> _mdfd_getseg() opens all segments up to the requested one. That
> causes problems for mdsyncfiletag(), if mdunlinkfork() has
> already unlinked other segment files. Open the file we want
> directly by name instead, if we don't have it already.
>
> The consequence of this bug was a rare panic in the checkpointer,
> made more likely if you saturated the sync request queue so that
> the SYNC_FORGET_REQUEST messages for a given relation were more
> likely to be absorbed in separate cycles by the checkpointer.
>
> Back-patch to 12. Defect in commit 3eb77eba.
>
> Author: Thomas Munro
> Reported-by: Justin Pryzby
> Discussion: https://postgr.es/m/20191119115759.GI30362%40telsasoft.com
> ---
> src/backend/storage/smgr/md.c | 56 ++++++++++++++++++++++++-----------
> 1 file changed, 39 insertions(+), 17 deletions(-)
>
> diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
> index 8a9eaf6430..1a7b91b523 100644
> --- a/src/backend/storage/smgr/md.c
> +++ b/src/backend/storage/smgr/md.c
> @@ -1280,25 +1280,47 @@ int
> mdsyncfiletag(const FileTag *ftag, char *path)
...
> - /* Try to open the requested segment. */
> - v = _mdfd_getseg(reln,
> - ftag->forknum,
> - ftag->segno * (BlockNumber) RELSEG_SIZE,
> - false,
> - EXTENSION_RETURN_NULL | EXTENSION_DONT_CHECK_SIZE);
> - if (v == NULL)
> - return -1;
> -
> - /* Try to fsync the file. */
> - return FileSync(v->mdfd_vfd, WAIT_EVENT_DATA_FILE_SYNC);
> + return result;
> }

This was the only user of EXTENSION_DONT_CHECK_SIZE. It's not generally safe,
as its comment says:

/*
* Allow opening segments which are preceded by segments smaller than
* RELSEG_SIZE, e.g. inactive segments (see above). Note that this breaks
* mdnblocks() and related functionality henceforth - which currently is ok,
* because this is only required in the checkpointer which never uses
* mdnblocks().
*/

But since the above change checkpointer doesn't use EXTENSION_DONT_CHECK_SIZE
anymore.

Seems we should remove this code?

Greetings,

Andres

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sergei Kornilov 2024-12-13 21:06:37 Re: Track the amount of time waiting due to cost_delay
Previous Message Vladlen Popolitov 2024-12-13 20:36:23 Re: COPY performance on Windows