From: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: subscriptionCheck failures on nightjar |
Date: | 2019-02-13 20:52:33 |
Message-ID: | CAEepm=0wB7vgztC5sg2nmJ-H3bnrBT5GQfhUzP+Ffq-WT3g8VA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Feb 14, 2019 at 8:11 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > I was kinda pondering just open coding it. I am not yet convinced that
> > my idea of just using an open FD isn't the least bad approach for the
> > issue at hand. What precisely is the NFS issue you're concerned about?
>
> I'm not sure that fsync-on-FD after the rename will work, considering that
> the issue here is that somebody might've unlinked the file altogether
> before we get to doing the fsync. I don't have a hard time believing that
> that might result in a failure report on NFS or similar. Yeah, it's
> hypothetical, but the argument that we need a repeat fsync at all seems
> equally hypothetical.
>
> > Right now fsync_fname_ext isn't exposed outside fd.c...
>
> Mmm. That makes it easier to consider changing its API.
Just to make sure I understand: it's OK for the file not to be there
when we try to fsync it by name, because a concurrent checkpoint can
remove it, having determined that we don't need it anymore? In other
words, we really needed either missing_ok=true semantics, or to use
the fd we already had instead of the name?
I found 3 examples of this failing with an ERROR (though not turning
the BF red, so nobody noticed) before the PANIC patch went in:
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=nightjar&dt=2018-09-10%2020%3A54%3A21&stg=subscription-check
2018-09-10 17:20:09.247 EDT [23287] sub1 ERROR: could not open file
"pg_logical/snapshots/0-161D778.snap": No such file or directory
2018-09-10 17:20:09.247 EDT [23285] ERROR: could not receive data
from WAL stream: ERROR: could not open file
"pg_logical/snapshots/0-161D778.snap": No such file or directory
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=nightjar&dt=2018-08-31%2023%3A25%3A41&stg=subscription-check
2018-08-31 19:52:06.634 EDT [52724] sub1 ERROR: could not open file
"pg_logical/snapshots/0-161D718.snap": No such file or directory
2018-08-31 19:52:06.634 EDT [52721] ERROR: could not receive data
from WAL stream: ERROR: could not open file
"pg_logical/snapshots/0-161D718.snap": No such file or directory
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=nightjar&dt=2018-08-22%2021%3A49%3A18&stg=subscription-check
2018-08-22 18:10:29.422 EDT [44208] sub1 ERROR: could not open file
"pg_logical/snapshots/0-161D718.snap": No such file or directory
2018-08-22 18:10:29.422 EDT [44206] ERROR: could not receive data
from WAL stream: ERROR: could not open file
"pg_logical/snapshots/0-161D718.snap": No such file or directory
--
Thomas Munro
http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2019-02-13 20:59:04 | Re: Using POPCNT and other advanced bit manipulation instructions |
Previous Message | Tom Lane | 2019-02-13 20:50:02 | Re: Commit Fest 2019-01 is now closed |