Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
Date: 2022-02-03 04:15:08
Message-ID: CALj2ACXGwc5UBx1Jc2wacqEM1oBmFHjRHxvTDA+X48sE7ObcFQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 3, 2022 at 12:07 AM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
>
> On Wed, Feb 02, 2022 at 05:19:26PM +0530, Bharath Rupireddy wrote:
> > On Wed, Feb 2, 2022 at 5:25 AM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
> >> However, I'm not sure about the change to ReadDirExtended(). That might be
> >> okay for CheckPointSnapBuild(), which is just trying to remove old files,
> >> but CheckPointLogicalRewriteHeap() is responsible for ensuring that files
> >> are flushed to disk for the checkpoint. If we stop reading the directory
> >> after an error and let the checkpoint continue, isn't it possible that some
> >> mappings files won't be persisted to disk?
> >
> > Unless I mis-read your above statement, with LOG level in
> > ReadDirExtended, I don't think we stop reading the files in
> > CheckPointLogicalRewriteHeap. Am I missing something here?
>
> ReadDirExtended() has the following comment:
>
> * If elevel < ERROR, returns NULL after any error. With the normal coding
> * pattern, this will result in falling out of the loop immediately as
> * though the directory contained no (more) entries.
>
> If there is a problem reading the directory, we will LOG and then exit the
> loop. If we didn't scan through all the entries in the directory, there is
> a chance that we didn't fsync() all the files that need it.

Thanks. I get it. For syncing map files, we don't want to tolerate any
errors, whereas removal of the old map files (lesser than cutoff LSN)
can be tolerated in CheckPointLogicalRewriteHeap.

Here's the v7 version using ReadDir for CheckPointLogicalRewriteHeap.

Regards,
Bharath Rupireddy.

Attachment Content-Type Size
v7-0001-Replace-ReadDir-with-ReadDirExtended.patch application/octet-stream 4.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2022-02-03 04:25:27 Re: support for CREATE MODULE
Previous Message Justin Pryzby 2022-02-03 03:58:28 Re: Adding CI to our tree