From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
Cc: | Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Sean Chittenden <seanc(at)joyent(dot)com> |
Subject: | Re: WAL prefetch |
Date: | 2018-06-16 19:02:10 |
Message-ID: | 20180616190210.pqz42a5nxhqy7jw6@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2018-06-16 11:38:59 +0200, Tomas Vondra wrote:
>
>
> On 06/15/2018 08:01 PM, Andres Freund wrote:
> > On 2018-06-14 10:13:44 +0300, Konstantin Knizhnik wrote:
> > >
> > >
> > > On 14.06.2018 09:52, Thomas Munro wrote:
> > > > On Thu, Jun 14, 2018 at 1:09 AM, Konstantin Knizhnik
> > > > <k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
> > > > > pg_wal_prefetch function will infinitely traverse WAL and prefetch block
> > > > > references in WAL records
> > > > > using posix_fadvise(WILLNEED) system call.
> > > > Hi Konstantin,
> > > >
> > > > Why stop at the page cache... what about shared buffers?
> > > >
> > >
> > > It is good question. I thought a lot about prefetching directly to shared
> > > buffers.
> >
> > I think that's definitely how this should work. I'm pretty strongly
> > opposed to a prefetching implementation that doesn't read into s_b.
> >
>
> Could you elaborate why prefetching into s_b is so much better (I'm sure it
> has advantages, but I suppose prefetching into page cache would be much
> easier to implement).
I think there's a number of issues with just issuing prefetch requests
via fadvise etc:
- it leads to guaranteed double buffering, in a way that's just about
guaranteed to *never* be useful. Because we'd only prefetch whenever
there's an upcoming write, there's simply no benefit in the page
staying in the page cache - we'll write out the whole page back to the
OS.
- reading from the page cache is far from free - so you add costs to the
replay process that it doesn't need to do.
- you don't have any sort of completion notification, so you basically
just have to guess how far ahead you want to read. If you read a bit
too much you suddenly get into synchronous blocking land.
- The OS page is actually not particularly scalable to large amounts of
data either. Nor are the decisions what to keep cached likley to be
particularly useful.
- We imo need to add support for direct IO before long, and adding more
and more work to reach feature parity strikes meas a bad move.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2018-06-16 19:23:03 | Re: WAL prefetch |
Previous Message | Tom Lane | 2018-06-16 19:00:11 | Re: GCC 8 warnings |