Quick Links

Re: Refactoring the checkpointer's fsync request queue

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, sdn(at)amazon(dot)com, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
Subject:	Re: Refactoring the checkpointer's fsync request queue
Date:	2018-11-13 18:43:30
Message-ID:	CA+TgmoYVEAUxNGwdsBJ8BXh5UwnsnixuDvZ_tugunkgF-AG+NA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, Nov 13, 2018 at 1:07 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2018-11-13 12:04:23 -0500, Robert Haas wrote:
> > I still feel like this whole pass-the-fds-to-the-checkpointer thing is
> > a bit of a fool's errand, though. I mean, there's no guarantee that
> > the first FD that gets passed to the checkpointer is the first one
> > opened, or even the first one written, is there?
> I'm not sure I understand the danger you're seeing here. It doesn't have
> to be the first fd opened, it has to be an fd that's older than all the
> writes that we need to ensure made it to disk. And that ought to be
> guaranteed by the logic? Between the FileWrite() and the
> register_dirty_segment() (and other relevant paths) the FD cannot be
> closed.

Suppose backend A and backend B open a segment around the same time.
Is it possible that backend A does a write before backend B, but
backend B's copy of the fd reaches the checkpointer before backend A's
copy? If you send the FD to the checkpointer before writing anything
then I think it's fine, but if you write first and then send the FD to
the checkpointer I don't see what guarantees the ordering.

> > It seems like if you wanted to make this work reliably, you'd need to
> > do it the other way around: have the checkpointer (or some other
> > background process) open all the FDs, and anybody else who wants to
> > have one open get it from the checkpointer.
>
> That'd require a process context switch for each FD opened, which seems
> clearly like a no-go?

I don't know how bad that would be. But hey, no cost is too great to
pay as a workaround for insane kernel semantics, right?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Re: Refactoring the checkpointer's fsync request queue at 2018-11-13 18:07:05 from Andres Freund

Responses

Re: Refactoring the checkpointer's fsync request queue at 2018-11-13 23:43:59 from Thomas Munro

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Alvaro Herrera	2018-11-13 18:56:53	Re: Sync ECPG scanner with core
Previous Message	Tom Lane	2018-11-13 18:21:29	Re: Sync ECPG scanner with core