From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> |
Cc: | Peter Geoghegan <pg(at)bowt(dot)ie>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com> |
Subject: | Re: [HACKERS] Clock with Adaptive Replacement |
Date: | 2018-05-01 00:19:49 |
Message-ID: | 20180501001949.u32mdyxc6xjnqqxs@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2018-05-01 12:15:21 +1200, Thomas Munro wrote:
> On Thu, Apr 26, 2018 at 1:31 PM, Thomas Munro
> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> > ... I
> > suppose when you read a page in, you could tell the kernel that you
> > POSIX_FADV_DONTNEED it, and when you steal a clean PG buffer you could
> > tell the kernel that you POSIX_FADV_WILLNEED its former contents (in
> > advance somehow), on the theory that the coldest stuff in the PG cache
> > should now become the hottest stuff in the OS cache. Of course that
> > sucks, because the best the kernel can do then is go and read it from
> > disk, and the goal is to avoid IO. Given a hypothetical way to
> > "write" "clean" data to the kernel (so it wouldn't mark it dirty and
> > generate IO, but it would let you read it back without generating IO
> > if you're lucky), then perhaps you could actually achieve exclusive
> > caching at the two levels, and use all your physical RAM without
> > duplication.
>
> Craig said essentially the same thing, on the nearby fsync() reliability thread:
>
> On Sun, Apr 29, 2018 at 1:50 PM, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
> > ... I'd kind of hoped to go in
> > the other direction if anything, with some kind of pseudo-write op
> > that let us swap a dirty shared_buffers entry from our shared_buffers
> > into the OS dirty buffer cache (on Linux at least) and let it handle
> > writeback, so we reduce double-buffering. Ha! So much for that!
>
> I would like to reply to that on this thread which discusses double
> buffering and performance, to avoid distracting the fsync() thread
> from its main topic of reliability.
It's not going to happen. Robert and I talked to the kernel devs a
couple years back, and I've brought it up again. The kernel has
absolutely no chance to verify the content of that written data, meaning
that suddenly you'd get differing data based on cache pressure. It seems
unsurprising that kernel devs aren't wild about that idea. The
likelihood of that opening up weird exploits (imagine a suid binary
reading such data later!), seems also pretty high.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2018-05-01 00:24:19 | Re: "could not reattach to shared memory" on buildfarm member dory |
Previous Message | Thomas Munro | 2018-05-01 00:15:21 | Re: [HACKERS] Clock with Adaptive Replacement |