Re: EINTR in ftruncate()

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Chris Travers <chris(dot)travers(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: EINTR in ftruncate()
Date: 2022-07-01 21:52:33
Message-ID: CA+hUKGJc_PTA_EdPpOVHB-kF9TDZGruUP5XY3Ky6oTV1p1aK1A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jul 2, 2022 at 9:06 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2022-07-01 13:29:44 -0700, Andres Freund wrote:
> > On 2022-07-01 19:55:16 +0200, Alvaro Herrera wrote:
> > > Allow DSM allocation to be interrupted.
> > >
> > > Chris Travers reported that the startup process can repeatedly try to
> > > cancel a backend that is in a posix_fallocate()/EINTR loop and cause it
> > > to loop forever. Teach the retry loop to give up if an interrupt is
> > > pending. Don't actually check for interrupts in that loop though,
> > > because a non-local exit would skip some clean-up code in the caller.
> >
> > That whole approach seems quite wrong to me. At the absolute very least the
> > code needs to check if interrupts are being processed in the current context
> > before just giving up due to ProcDiePending || QueryCancelPending.
> >
> > I'm very unconvinced this ought to be fixed in dsm_impl_posix_resize(), rather
> > than the startup process signalling.

I agree it's not great. It was a back-patchable bandaid in need of a
better solution.

> Chris, do you have any additional details about the machine that lead to this
> change? OS version, whether it might have been swapping, etc?
>
> I wonder if what happened is that posix_fallocate() used glibc's fallback
> implementation because the kernel was old enough to not support fallocate()
> for tmpfs. Looks like support for fallocate() for tmpfs was added in 3.5
> ([1]). So e.g. a rhel 6 wouldn't have had that.

With a quick test program on my Linux 5.10 kernel I see that an
SA_RESTART signal handler definitely causes posix_fallocate() to
return EINTR (can post trivial program).

A drive-by look at the current/modern kernel source supports this:
shmem_fallocate returns -EINTR directly (not -ERESTARTSYS, which seems
to be the Linux-y way to say you want EINTR or restart as
appropriate?), and it also undoes all partial progress too (not too
surprising), which would explain why a perfectly timed machine gun
stream of signals from our recovery conflict system can make an
fallocate retry loop never terminate, for large enough sizes.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2022-07-01 21:56:42 Time to remove unparenthesized syntax for VACUUM?
Previous Message Nathan Bossart 2022-07-01 21:50:54 Re: pg_checkpointer is not a verb or verb phrase