From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | pgsql-hackers(at)postgresql(dot)org, Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com> |
Subject: | Re: Some thoughts on NFS |
Date: | 2019-02-19 22:25:22 |
Message-ID: | CA+hUKGJ3J_ZYKpOFM9EF2BOA8y71MfP5_ipLPsSwpB+dTt+GBQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Feb 20, 2019 at 5:52 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > 1. Figure out how to get the ALLOCATE command all the way through the
> > stack from PostgreSQL to the remote NFS server, and know for sure that
> > it really happened. On the Debian buster Linux 4.18 system I checked,
> > fallocate() reports EOPNOTSUPP for fallocate(), and posix_fallocate()
> > appears to succeed but it doesn't really do anything at all (though I
> > understand that some versions sometimes write zeros to simulate
> > allocation, which in this case would be equally useless as it doesn't
> > reserve anything on an NFS server). We need the server and NFS client
> > and libc to be of the right version and cooperate and tell us that
> > they have really truly reserved space, but there isn't currently a way
> > as far as I can tell. How can we achieve that, without writing our
> > own NFS client?
> >
> > 2. Deal with the resulting performance suckage. Extending 8kb at a
> > time with synchronous network round trips won't fly.
>
> I think I'd just go for fsync();pwrite();fsync(); as the extension
> mechanism, iff we're detecting a tablespace is on NFS. The first fsync()
> to make sure there's no previous errors that we could mistake for
> ENOSPC, the pwrite to extend, the second fsync to make sure there's
> actually space. Then we can detect ENOSPC properly. That possibly does
> leave some errors where we could mistake ENOSPC as something more benign
> than it is, but the cases seem pretty narrow, due to the previous
> fsync() (maybe the other side could be thin provisioned and get an
> ENOSPC there - but in that case we didn't actually loose any data. The
> only dangerous scenario I can come up with is that the remote side is on
> thinly provisioned CoW system, and a concurrent write to an earlier
> block runs out of space - but seriously, good riddance to you).
This seems to make sense, and has the advantage that it uses
interfaces that exist right now. But it seems a bit like we'll have
to wait for them to finish building out the errseq_t support for NFS
to avoid various races around the mapping's AS_EIO flag (A: fsync() ->
EIO, B: fsync() -> SUCCESS, log checkpoint; A: panic), and then maybe
we'd have to get at least one of { fd-passing, direct IO, threads }
working on our side ...
--
Thomas Munro
https://enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2019-02-19 22:29:19 | Re: Some thoughts on NFS |
Previous Message | Thomas Munro | 2019-02-19 22:08:45 | Re: Some thoughts on NFS |