From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Cc: | Chris Travers <chris(dot)travers(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: EINTR in ftruncate() |
Date: | 2022-07-01 22:17:22 |
Message-ID: | 20220701221722.os4ktbe5pnciqguv@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2022-07-02 09:52:33 +1200, Thomas Munro wrote:
> On Sat, Jul 2, 2022 at 9:06 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > On 2022-07-01 13:29:44 -0700, Andres Freund wrote:
> > Chris, do you have any additional details about the machine that lead to this
> > change? OS version, whether it might have been swapping, etc?
> >
> > I wonder if what happened is that posix_fallocate() used glibc's fallback
> > implementation because the kernel was old enough to not support fallocate()
> > for tmpfs. Looks like support for fallocate() for tmpfs was added in 3.5
> > ([1]). So e.g. a rhel 6 wouldn't have had that.
>
> With a quick test program on my Linux 5.10 kernel I see that an
> SA_RESTART signal handler definitely causes posix_fallocate() to
> return EINTR (can post trivial program).
>
> A drive-by look at the current/modern kernel source supports this:
> shmem_fallocate returns -EINTR directly (not -ERESTARTSYS, which seems
> to be the Linux-y way to say you want EINTR or restart as
> appropriate?), and it also undoes all partial progress too (not too
> surprising), which would explain why a perfectly timed machine gun
> stream of signals from our recovery conflict system can make an
> fallocate retry loop never terminate, for large enough sizes.
Yea :(
And even if we fix recovery to not do douse other processes in signals quite
that badly, there are plenty other sources of signals that can arrive at a
steady clip. So I think we need to do something to defuse this another way.
Ideas:
1) do the fallocate in smaller chunks, thereby making it much more likely to
complete between two signal deliveries
2) block signals while calling posix_fallocate(). That won't work for
everything (e.g. rapid SIGSTOP/SIGCONT), but that's not something we'd send
ourselves, so whatever.
3) 1+2
4) ?
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2022-07-01 22:19:28 | Re: Time to remove unparenthesized syntax for VACUUM? |
Previous Message | Nathan Bossart | 2022-07-01 22:13:16 | Re: Time to remove unparenthesized syntax for VACUUM? |