From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, sdn(at)amazon(dot)com, Dmitry Dolgov <9erthalion6(at)gmail(dot)com> |
Subject: | Re: Refactoring the checkpointer's fsync request queue |
Date: | 2018-11-13 18:43:30 |
Message-ID: | CA+TgmoYVEAUxNGwdsBJ8BXh5UwnsnixuDvZ_tugunkgF-AG+NA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Nov 13, 2018 at 1:07 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2018-11-13 12:04:23 -0500, Robert Haas wrote:
> > I still feel like this whole pass-the-fds-to-the-checkpointer thing is
> > a bit of a fool's errand, though. I mean, there's no guarantee that
> > the first FD that gets passed to the checkpointer is the first one
> > opened, or even the first one written, is there?
> I'm not sure I understand the danger you're seeing here. It doesn't have
> to be the first fd opened, it has to be an fd that's older than all the
> writes that we need to ensure made it to disk. And that ought to be
> guaranteed by the logic? Between the FileWrite() and the
> register_dirty_segment() (and other relevant paths) the FD cannot be
> closed.
Suppose backend A and backend B open a segment around the same time.
Is it possible that backend A does a write before backend B, but
backend B's copy of the fd reaches the checkpointer before backend A's
copy? If you send the FD to the checkpointer before writing anything
then I think it's fine, but if you write first and then send the FD to
the checkpointer I don't see what guarantees the ordering.
> > It seems like if you wanted to make this work reliably, you'd need to
> > do it the other way around: have the checkpointer (or some other
> > background process) open all the FDs, and anybody else who wants to
> > have one open get it from the checkpointer.
>
> That'd require a process context switch for each FD opened, which seems
> clearly like a no-go?
I don't know how bad that would be. But hey, no cost is too great to
pay as a workaround for insane kernel semantics, right?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2018-11-13 18:56:53 | Re: Sync ECPG scanner with core |
Previous Message | Tom Lane | 2018-11-13 18:21:29 | Re: Sync ECPG scanner with core |