Quick Links

Re: WAL prefetch

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc:	Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Sean Chittenden <seanc(at)joyent(dot)com>
Subject:	Re: WAL prefetch
Date:	2018-06-16 19:02:10
Message-ID:	20180616190210.pqz42a5nxhqy7jw6@alap3.anarazel.de
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 2018-06-16 11:38:59 +0200, Tomas Vondra wrote:
>
>
> On 06/15/2018 08:01 PM, Andres Freund wrote:
> > On 2018-06-14 10:13:44 +0300, Konstantin Knizhnik wrote:
> > >
> > >
> > > On 14.06.2018 09:52, Thomas Munro wrote:
> > > > On Thu, Jun 14, 2018 at 1:09 AM, Konstantin Knizhnik
> > > > <k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
> > > > > pg_wal_prefetch function will infinitely traverse WAL and prefetch block
> > > > > references in WAL records
> > > > > using posix_fadvise(WILLNEED) system call.
> > > > Hi Konstantin,
> > > >
> > > > Why stop at the page cache... what about shared buffers?
> > > >
> > >
> > > It is good question. I thought a lot about prefetching directly to shared
> > > buffers.
> >
> > I think that's definitely how this should work. I'm pretty strongly
> > opposed to a prefetching implementation that doesn't read into s_b.
> >
>
> Could you elaborate why prefetching into s_b is so much better (I'm sure it
> has advantages, but I suppose prefetching into page cache would be much
> easier to implement).

I think there's a number of issues with just issuing prefetch requests
via fadvise etc:

- it leads to guaranteed double buffering, in a way that's just about
guaranteed to *never* be useful. Because we'd only prefetch whenever
there's an upcoming write, there's simply no benefit in the page
staying in the page cache - we'll write out the whole page back to the
OS.
- reading from the page cache is far from free - so you add costs to the
replay process that it doesn't need to do.
- you don't have any sort of completion notification, so you basically
just have to guess how far ahead you want to read. If you read a bit
too much you suddenly get into synchronous blocking land.
- The OS page is actually not particularly scalable to large amounts of
data either. Nor are the decisions what to keep cached likley to be
particularly useful.
- We imo need to add support for direct IO before long, and adding more
and more work to reach feature parity strikes meas a bad move.

Greetings,

Andres Freund

In response to

Re: WAL prefetch at 2018-06-16 09:38:59 from Tomas Vondra

Responses

Re: WAL prefetch at 2018-06-16 19:34:30 from Tomas Vondra
Re: WAL prefetch at 2018-06-16 20:25:34 from Konstantin Knizhnik

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andres Freund	2018-06-16 19:23:03	Re: WAL prefetch
Previous Message	Tom Lane	2018-06-16 19:00:11	Re: GCC 8 warnings